When I index JSON files I get duplicate entries in the Splunk index and some values are not indexed at.
Example of the JSON files:
{
"State": "value"
"TimeStarted": "03-jan-2018 10:13:29",
"RBName": "Value",
"Tower": "Value",
"RBType": "Value",
"ManualTimeToExecute": 20,
"RefGUID": "cad8efd8-58c4-4924-add7-78c8f9768b83",
"TicketDetails": {
"TimeData": "03-jan-2018 10:13:30",
"Description": "Value",
"TicketNo": "Value",
"TimeCreated": "03-jan-2018 10:13:12",
"ShortDescription": "Value",
"State": "Value",
"ClientRefNumber": "Value"
},
"Activities": [
{
"LogLevel": "Information",
"LogTime": "03-jan-2018 10:13:31",
"Completion": "Success",
"Severity": "GOOD",
"ImpactedUser": "Value",
"Condition": "GOOD",
"LogMessage": " Value",
"ActionTaskName": "Value"
},
],
"Comment": "Value",
"Completion": "Success",
"Condition": "BAD",
"EndTime": "03-jan-2018 10:13:57",
"Severity": "WARNING"
}
The JSON files contains one array which can contain upto 30 items and the file name of each JSON is unique.
The results of indexing the JSON files is:
I use Splunk 7.1 version and the default _json source type to index the files. The JSON files are hosted on the same server as Splunk is installed in a folder
Any idea how to fix the duplicate entries in the index and why some values are not indexed at all?
Are you sure the duplicate RefGUIDs are incorrect? That would make it sound like events were indexed twice, instead of parsed twice.
And the Condition may not necessarily be wrong either. Do you see a single event that has the Condition
value in the JSON but not parsed out by splunk?
I tried a complete reinstall of SPlunk, same results. 😞
What I noticed that the JSON's missing some values are all indexed only the first +/-160 rows, somehow it doesn't index the complete JSON file. Is there somewhere a limit that I need to increase? Some of the JSON's are upto 500-600 rows in length.
I fixed the missing values by adding following settings to the json source type:
- TRUNCATE =0
- MAX_EVENTS=1000
Now the complete JSON's gets indexed but still twice.
Any idea how to get rid of the twice indexed JSON's?
I really looks like all files are indexed twice instead of parsed twice. I started over with clean index and right after the indexing starts you can see that same file is indexed twice (check file name GUID in screenshot 3).
Yes, I checked the original JSON files and they all contain a value in the Condition field.
Check the output of splunk list monitor
to see if the file somehow shows up twice.
This outputs exactly the 78 JSON files that are in the folder
Will you add the output of:
splunk btool props list _json --debug
(From your screenshot it looks like the sourcetype is _json
)
Hereby the output:
There is certainly nothing in there that I'd expect to be causing this. Can you also send the inputs.conf responsible for this data?
I am testing on clean install of Splunk.
Inputs.conf in splunk\etc\apps\search\default:
Inputs.conf in splunk\etc\system\default:
[default]
index = default
_rcvbuf = 1572864
host = $decideOnStartup
evt_resolve_ad_obj = 0
evt_dc_name=
evt_dns_name=
[blacklist:$SPLUNK_HOME\etc\auth]
[monitor://$SPLUNK_HOME\var\log\splunk]
index = _internal
[monitor://$SPLUNK_HOME\var\log\splunk\license_usage_summary.log]
index = _telemetry
[monitor://$SPLUNK_HOME\etc\splunk.version]
_TCP_ROUTING = *
index = _internal
sourcetype=splunk_version
[batch://$SPLUNK_HOME\var\spool\splunk]
move_policy = sinkhole
crcSalt =
[batch://$SPLUNK_HOME\var\spool\splunk...stash_new]
queue = stashparsing
sourcetype = stash_new
move_policy = sinkhole
crcSalt =
[fschange:$SPLUNK_HOME\etc]
pollPeriod = 600
signedaudit=true
recurse=true
followLinks=false
hashMaxSize=-1
fullEvent=false
sendEventMaxSize=-1
filesPerDelay = 10
delayInMills = 100
[udp]
connection_host=ip
[tcp]
acceptFrom=*
connection_host=dns
[splunktcp]
route=has_key:_replicationBucketUUID:replicationQueue;has_key:_dstrx:typingQueue;has_key:_linebreaker:indexQueue;absent_key:_linebreaker:parsingQueue
acceptFrom=*
connection_host=ip
[script]
interval = 60.0
start_by_shell = false
[SSL]
sslVersions = tls1.2
cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
ecdhCurves = prime256v1, secp384r1, secp521r1
allowSslRenegotiation = true
sslQuietShutdown = false
[script://$SPLUNK_HOME\bin\scripts\splunk-wmi.path]
disabled = 0
interval = 10000000
source = wmi
sourcetype = wmi
queue = winparsing
persistentQueueSize=200MB
[admon]
interval=60
baseline=0
[MonitorNoHandle]
interval=60
[WinEventLog]
interval=60
evt_resolve_ad_obj = 0
evt_dc_name=
evt_dns_name=
[WinNetMon]
interval=60
[WinPrintMon]
interval=60
[WinRegMon]
interval=60
baseline=0
[perfmon]
interval=300
[powershell]
interval=60
[powershell2]
interval=60