Splunk Search

Why are several JSON fields getting extracted more than once at search-time?

mathiask
Communicator

At search-time, several fields get extracted more than once, even if they only exist once in the event.
I know I can dedup the search, but this is fighting the symptom not solving the problem
The Question is, what config do I have to change to get this fixed?

Issue:
The fields "url" and "timestamp" show up twice with the same value in the search
timestamp = 2015-08-20T12:03:33Z timestamp = 2015-08-20T12:03:33Z
url = http://www.switch.ch/ url = http://www.switch.ch/

Partial Example Event, in the log it is in one line
{
<other stuff>
<other stuff>
<other stuff>
<other stuff>
<other stuff>
timestamp: 2015-08-20T12:03:33Z
<other stuff>
url: http://www.switch.ch/
<other stuff>
}

[sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

0 Karma
1 Solution

mathiask
Communicator

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

View solution in original post

mathiask
Communicator

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

somesoni2
Revered Legend

I guess the problem could be with the field extraction you're doing. Based on your sourcetype definition, you're using both INDEXED_EXTRACTION (index time field extraction) and KV_MODE (search time field extraction). With this you get every field extracted twice. I would recommend to use search time field extraction, so try this for your sourcetype definition:-

[sourcetype]
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

nawazns5038
Builder
0 Karma

mathiask
Communicator

Thank for your help
I think this topic i now found covers it better http://answers.splunk.com/answers/223095/why-is-my-sourcetype-configuration-for-json-events.html
The issue I created by
using

INDEXED_EXTRACTIONS = json
KV_MODE = json

Changing to

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false

Fixed it, but now I wonder if where I currently index all the json fields (which might cause quite some indexing) instead of only _time, source, host, sourcetype

0 Karma

koshyk
Super Champion

I think its extracting ok, but Splunk has already done the timestamp extraction automatically on top of what you specified, hence duplicating. Could you please try..

# props.conf   
[sourcetype]
NO_BINARY_CHECK = 1
TIME_PREFIX = "timestamp"
pulldown_type = 1
KV_MODE = JSON
# Sometimes below is required.
# BREAK_ONLY_BEFORE = (^{)
0 Karma

mathiask
Communicator

Okay I will try that ..
I also found the Time_PREFIX option
But I did not use it because it does not explain why the url gets extracted twice

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...