We have the following config, which does index-time field extraction of job field, and search time field extraction of json events (KV_MODE=json).
fields.conf
[job]
INDEXED=true
transforms.conf
[my_job]
REGEX = \"job\":\"(?<job>[^\"]+)\"
FORMAT = job::$1
WRITE_META = true
props.conf
[my_json]
KV_MODE = json
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIME_PREFIX = \"time\":\"
TRANSFORMS-job = my_job
disabled = false
Not surprisingly the job field (only) gets extracted twice, so a search with "... | table job other_field" gives results like this:
job other_field
--- ------------
job1 other_value1
job1
job2 other_value2
job2
I have read here: http://docs.splunk.com/Documentation/Splunk/6.0/Data/Configureindex-timefieldextraction that since "a field of the same name is extracted at search time" we should set fields.conf INDEXED=false but this did not seem to help, even for events that were indexed after the change. Also the fields.conf/job setting is shared by other non-json source types that are working fine.
Any suggestions?
Sorry in advance to resurrect this thread, but we had a similar issue. Setting AUTO_KV_JSON=false
in the corresponding sourcetype stanza in the props.conf file on the search head resolved the issue.
Thanks for responding. There is no duplication of keys. The output described above (double values for "job" field, but not "other_field") can be see with data like this:
{"time":"2016-01-18T22:35:39.000Z","job":"job1","other_field":"other_value1"}
{"time":"2016-01-19T22:35:39.000Z","job":"job2","other_field":"other_value2"}
I think the problem is job field is extracted twice: once with our intentional index-time extraction (as shown in fields/transforms/props.conf), then again at search time with KV_MODE=json. The KV_MODE=json works great for all our other json fields, but is redundant for "job" field which has already been extracted.
You are probably doing BOTH KV_MODE=JSON
and INDEXED_EXTRACTIONS=JSON
. Do only the latter.
Thanks for responding. We are not using INDEXED_EXTRACTIONS as we have large json events with many fields and we don't want all the fields indexed. But the problem is similar to the duplicate fields folks see when using both KV_MODE=json and INDEXED_EXTRACTIONS=json...
We do intentionally index the "job" field only. And this is the only field for which we see the duplicate fields, which makes sense since the "job" field is being extracted at index time and then again at search time (with KV_MODE=json). Of course we want KV_MODE=json for search time field extraction on the many other fields of the json event.
I am having the exact same issue. I intentionally index two fields (out of 50) in my json event. At search time, the KV_MODE=json does search time extraction of the same field. Did you ever get an answer @rgsage?
Thanks for the bump. No we did not get a solution for this problem. Currently we just living with the double extraction 😞
I believe you have two job KvPs in each of your json events...
Like this:
{ "job" : { "job" : "1", "status": "good"}}
The first job has multiple values, the 2nd has a single value.
Please give us an example of a full JSON event (redact sensitive info), so that we may assist you further.