Splunk Search

How do you go about formatting a nested JSON that was extracted using the spath command?

wsanderstii
Path Finder

There all kinds of questions (and not too many answers) about processing nested JSON, either at the source or in search. I have some nested JSON that the spath command can extract the fields from, but the display in the Search & Reporting app is still only one JSON level deep. For example:

{   [-] 
     log:    {"message":"looks like we got no XML document","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}

     stream:     stdout 
     time:   2018-10-17T20:49:01.841051338Z 
}

The spath command successfully extracts the fields in the "log" element, but I'd like to actually see the "log" properly formatted:

{
    "channel": "lumen",
    "context": {
        "account_id": 1234,
        "method": "GET",
        "path": "somepath",
        "status": 400,
 ...etc
      "message": "looks like we got no XML document"
 }

Anyway to do this in a search?

Tags (2)
0 Karma

harsmarvania57
Ultra Champion

Try below configurations on Indexer or Heavy Forwarder whichever comes first from Universal Forwarder and remove INDEXED_EXTRACTIONS = json on Universal Forwarder

props.conf

[yourSourcetype]
SHOULD_LINEMERGE=true
NO_BINARY_CHECK=true
SEDCMD-removeslash=s/(?:\\"|\\\\")/"/g
SEDCMD-removenewline=s/\\\\n//g
TIME_PREFIX="time":\s"
MAX_TIMESTAMP_LOOKAHEAD=30

In above configuration I was not able to parse \\n to new line so I have removed that using SEDCMD so you will see long string in log.message field without new lines which might look ugly otherwise splunk is extracting all required field which you require based on below sample data.

{ "log":     {\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE\","context":{"status":400,"traceId":"aacb332c-e907-352b-9f8b-a72a55d75cd0","path":"somepath","method":"GET","account_id":1234},"level":200,"level_name":"INFO","channel":"lumen","datetime":{"date":"2018-10-17 20:49:01.839792","timezone_type":3,"timezone":"UTC"},"extra":[]}, "stream": "stdout", "time": "2018-10-17T20:49:01.841051338Z" }
0 Karma

wsanderstii
Path Finder

Great answer. I won't be able to test this for a while but I am going to reference it in my future configs.

0 Karma

harsmarvania57
Ultra Champion

I have converted my comment to answer if it will work for you then you can accept it as answer.

0 Karma

wsanderstii
Path Finder

We do have that set AFAIK. However, a closer look at the raw entry:

{"log":"{\"message\":\"\\n\u003c?xml version=\\\"1.0\\\" encoding=\\\"utf-8\\\"?\u003e\\n\u003c!DOCTYPE...

shows "log" is actually a string and not a JSON. You have to feed it into a formatter without the surrounding quotes.

That being said, appending "spath input=log" to the query will extract all the fields in the string "log", it just won't pretty print the results.

I don't think " INDEXED_EXTRACTIONS = json" can account for this without some customization.

0 Karma

harsmarvania57
Ultra Champion

Hi @wsanderstii,

If you are ingesting this data into Splunk using Splunk Universal Forwarder then can you please try below configuration on your Universal Forwarder?

props.conf

[yourSourcetype]
INDEXED_EXTRACTIONS = json

And then restart splunk service on splunk universal forwarder.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...