I have json log files that I need to pull into my Splunk instance. They have some trash data at the beginning and end that I plan on removing with SEDCMD
. My end goal is to clean up the file using SEDCMD, index properly (line break & timestamp), auto-parse as much as possible.
The logs are on a system with a UF which send to the indexers. I'm getting very confused about INDEXED_EXTRACTIONS
& KV_MODE
. I thought that I would use INDEXED_EXTRACTIONS
on the UF props.conf
, then everything else I need on my indexers, but the docs state that:
When you forward structured data to an indexer, it is not parsed when it arrives at the indexer, even if you have configured props.conf on that indexer with INDEXED_EXTRACTIONS. Forwarded data skips the following pipelines on the indexer, which precludes any parsing of that data on the indexer...
This leads me to believe that if I use INDEXED_EXTRACTIONS
on the UF, it won't apply any of the indexer props...so do I just use INDEXED_EXTRACTIONS
on my indexers instead? Or does that only apply if I use one of the pretrained sourcetypes? Some answers I read said to use KV_MODE
on the search heads? I'm pretty lost on this one.
I have this written up so far:
inputs.conf ON UF
[monitor://path_to_files]
index = my_json_index
sourcetype = my_custom_sourcetype
props.conf ON IDX
[my_custom_sourcetype]
disabled = false
INDEXED_EXTRACTIONS = JSON
KV_MODE = none
SHOULD_LINEMERGE = false
TRUNCATE = 0
LINE_BREAKER = (,)\{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
TIME_PREFIX = \{\"type\":\"\w+\",\"id\":\"\d+\",\"eventTime\":\"
TIME_FORMAT = %FT%T.%3Q
TIME_ZONE = UTC
SEDCMD-1_del_header = s/.*\"events\":\[//g
SEDCMD-2_clean_eof = s/\(.*\)\]\}/\1/g
Hi!
If you want to use INDEXED_EXTRACTIONS = JSON you need to use it in the props on the UF. You do not need any other line breaking settings (in fact i think they will be ignored). But the file you want to read needs to be in the correct json syntax! As far as i remember it is a array of json objects.
If you want to do the line breaking by hand, you need to do it on the indexers as usual.
If you set INDEXED_EXTRACTIONS = JSON on the UF, do not set KV_MODE=JSON on the SH. This will extract fields at index time AND at search time, which will give you fields with duplicated values.
Greetings
Tom
Hi @tom_frotscher ! I think I understand it better now. If I use INDEXED_EXTRACTIONS
on my UF, then that will override any props on my indexer. The problem is that my file is JSON format, but it has a non-standard header and footer that I will need to delete via SEDCMD
before it's JSON "proper". The UF can't use transforms to clean that up.
Based on what you said, if I were to use INDEXED_EXTRACTIONS on my UF, it may not work because my data isn't JSON-proper (yet).
I believe the solution will be then to just do everything on the indexer (no INDEXED_EXTRACTIONS since I have my own line_breaker), then use KV_MODE=JSON on the SH. Does that solution make sense or am I off base on this?
For what you're doing here.... I don't know that I would use INDEXED_EXTRACTIONS, but instead use KV_MODE=json on the search head, and have the line breaker settings on the indexers, but I want to put together a sample of JSON logs wrapped in an array, wrapped in an object to try out and play with before giving an answer. My fear is that INDEXED_EXTRACTIONS uses its own linebreaker, and that may work against you.... but I honestly don't know.
First some of my favorite references about props settings: