Knowledge Management

How to avoid double field extraction on a single indexed field?

rgsage
Path Finder

We have the following config, which does index-time field extraction of job field, and search time field extraction of json events (KV_MODE=json).
fields.conf

[job]
 INDEXED=true

transforms.conf

 [my_job]
 REGEX = \"job\":\"(?<job>[^\"]+)\"
 FORMAT = job::$1
 WRITE_META = true

props.conf

 [my_json]
 KV_MODE = json
 NO_BINARY_CHECK = true
 SHOULD_LINEMERGE = false
 TIME_PREFIX = \"time\":\"
 TRANSFORMS-job = my_job
 disabled = false

Not surprisingly the job field (only) gets extracted twice, so a search with "... | table job other_field" gives results like this:

job     other_field
---     ------------
job1    other_value1
job1
job2    other_value2
job2

I have read here: http://docs.splunk.com/Documentation/Splunk/6.0/Data/Configureindex-timefieldextraction that since "a field of the same name is extracted at search time" we should set fields.conf INDEXED=false but this did not seem to help, even for events that were indexed after the change. Also the fields.conf/job setting is shared by other non-json source types that are working fine.

Any suggestions?

Tags (1)
0 Karma

eugenerinaldi_a
Engager

Sorry in advance to resurrect this thread, but we had a similar issue. Setting AUTO_KV_JSON=false in the corresponding sourcetype stanza in the props.conf file on the search head resolved the issue.

0 Karma

rgsage
Path Finder

Thanks for responding. There is no duplication of keys. The output described above (double values for "job" field, but not "other_field") can be see with data like this:

{"time":"2016-01-18T22:35:39.000Z","job":"job1","other_field":"other_value1"}
{"time":"2016-01-19T22:35:39.000Z","job":"job2","other_field":"other_value2"}

I think the problem is job field is extracted twice: once with our intentional index-time extraction (as shown in fields/transforms/props.conf), then again at search time with KV_MODE=json. The KV_MODE=json works great for all our other json fields, but is redundant for "job" field which has already been extracted.

0 Karma

woodcock
Esteemed Legend

You are probably doing BOTH KV_MODE=JSON and INDEXED_EXTRACTIONS=JSON. Do only the latter.

0 Karma

rgsage
Path Finder

Thanks for responding. We are not using INDEXED_EXTRACTIONS as we have large json events with many fields and we don't want all the fields indexed. But the problem is similar to the duplicate fields folks see when using both KV_MODE=json and INDEXED_EXTRACTIONS=json...

We do intentionally index the "job" field only. And this is the only field for which we see the duplicate fields, which makes sense since the "job" field is being extracted at index time and then again at search time (with KV_MODE=json). Of course we want KV_MODE=json for search time field extraction on the many other fields of the json event.

0 Karma

lyndac
Contributor

I am having the exact same issue. I intentionally index two fields (out of 50) in my json event. At search time, the KV_MODE=json does search time extraction of the same field. Did you ever get an answer @rgsage?

0 Karma

rgsage
Path Finder

Thanks for the bump. No we did not get a solution for this problem. Currently we just living with the double extraction 😞

0 Karma

jkat54
SplunkTrust
SplunkTrust

I believe you have two job KvPs in each of your json events...

Like this:
{ "job" : { "job" : "1", "status": "good"}}

The first job has multiple values, the 2nd has a single value.

Please give us an example of a full JSON event (redact sensitive info), so that we may assist you further.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...