Splunk Search

Why are several JSON fields getting extracted more than once at search-time?

mathiask
Communicator

At search-time, several fields get extracted more than once, even if they only exist once in the event.
I know I can dedup the search, but this is fighting the symptom not solving the problem
The Question is, what config do I have to change to get this fixed?

Issue:
The fields "url" and "timestamp" show up twice with the same value in the search
timestamp = 2015-08-20T12:03:33Z timestamp = 2015-08-20T12:03:33Z
url = http://www.switch.ch/ url = http://www.switch.ch/

Partial Example Event, in the log it is in one line
{
<other stuff>
<other stuff>
<other stuff>
<other stuff>
<other stuff>
timestamp: 2015-08-20T12:03:33Z
<other stuff>
url: http://www.switch.ch/
<other stuff>
}

[sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

0 Karma
1 Solution

mathiask
Communicator

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

View solution in original post

mathiask
Communicator

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

somesoni2
SplunkTrust
SplunkTrust

I guess the problem could be with the field extraction you're doing. Based on your sourcetype definition, you're using both INDEXED_EXTRACTION (index time field extraction) and KV_MODE (search time field extraction). With this you get every field extracted twice. I would recommend to use search time field extraction, so try this for your sourcetype definition:-

[sourcetype]
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

nawazns5038
Builder
0 Karma

mathiask
Communicator

Thank for your help
I think this topic i now found covers it better http://answers.splunk.com/answers/223095/why-is-my-sourcetype-configuration-for-json-events.html
The issue I created by
using

INDEXED_EXTRACTIONS = json
KV_MODE = json

Changing to

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false

Fixed it, but now I wonder if where I currently index all the json fields (which might cause quite some indexing) instead of only _time, source, host, sourcetype

0 Karma

koshyk
Super Champion

I think its extracting ok, but Splunk has already done the timestamp extraction automatically on top of what you specified, hence duplicating. Could you please try..

# props.conf   
[sourcetype]
NO_BINARY_CHECK = 1
TIME_PREFIX = "timestamp"
pulldown_type = 1
KV_MODE = JSON
# Sometimes below is required.
# BREAK_ONLY_BEFORE = (^{)
0 Karma

mathiask
Communicator

Okay I will try that ..
I also found the Time_PREFIX option
But I did not use it because it does not explain why the url gets extracted twice

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...