(currently using Splunk 4.3.3 build 128297)
I have poked around the docs covering index time field extraction and some of the related Q&A but I decide I would ask directly outlining our situation.
We have a logging facility that several of our future product will use. This facility receives JSON payloads containing key/value pairs like the following (names have been changed to protect the innocent).
{ "key1" : "value1", "key2" : "value2", (could contain more pairs) "entries" : [ { "key3" : "value3a", "key4" : "value4a", "key5" : "value5", (could contain more pairs) }, { "key3" : "value3b", "key4" : "value4b", "key6" : "value6", (could contain more pairs) }, (could contain more entries) ] }
When the logging facility gets the above example JSON payload it would turn it into the following two log statements and push those to splunk via TCP.
timestamp key1="value1" key2="value2" key3="value3a" key4="value4a" key5="value5" timestamp key1="value1" key2="value2" key3="value3b" key4="value4b" key6="value6"
We are defining "key1" to be used to denote the product/component submitting the data and the value it contains would follow a reverse DNS style naming convention but with no real restrictions on the hierarchy of it other then ensuring it likely unique across our family of products. For example: "mycompany.product.component" or "mycompany.mydivision.product.component.subcomponent".
The remaining key/value pairs are product specific (aka can be whatever the product wants). In other words key1 will be used to namespace the rest of the key/value pairs.
We are considering adding "key1" to be extracted at index time. I believe by doing so would speed our ability to focus on the events coming from a particular product and/or component out in the field.
Search possibilities...
key1="mycompany.product.*" ...blah... key1="mycompany.product.component" ...blah... key1="*.component.*" ...blah... etc.
Opinions?
Based on this post, it sounds like this may be one of the cases where it does makes sense:
Have you considered making key1 the sourcetype or the source? It is a safer solution and will still allow you to use metasearch and other fun indexed field tricks
I advise against the use of custom indexed fields, namely because it changes the structure of your index compared to your other indices and is not advised by the docs.