Getting Data In

JSON field extraction when portion of the log is JSON format and some values are type array with comma delimeter

jayannah
Builder

Hi All

I have 2 questions.

Question-1:
I wanted to write auto extract KVs from JSON portion of the events.
Here is the JSON portion of sample event --> "actionId":47533796,"clientIP":"127.0.0.1","actions":["Action1","Action2","Action3"],"user":"alia",...

I wrote regex --> \"([\w]+)\":([^,}]+)

Here, the complexity is value type is sometime array of values with comma separator with variable number of members.
So for the key "actions", the above regex gives the values as ["Action1",
Since "," is also delimiter for separating JSON KVs, the array value also uses "," as delimiter, hence the problem.
how can I auto extract all the KVs from JSON string for the above case? I want to put in transforms.conf for auto extraction of all the KVs from JSON string.......

Question-2:
If I want to replace the "," only for array value using SED command, how can I do it?
E,g: I want to replace comma delimiter with semi-colon only for the values of array type[a,brrrr,c2,d,ehh,.....,w] with [a;brrrr;c2;d;ehh;....;w]

Tags (2)
1 Solution

charltones
Explorer

Try this regex:

"([\w]+)":(\[[^\]]+\]|[^,}]+)

It optionally matches a:[b,c,d] or a:b

View solution in original post

charltones
Explorer

Ah understood, in that case maybe a scripted input would be a better option?
http://docs.splunk.com/Documentation/Splunk/6.0/Data/Setupcustominputs

You could write some python (or other script) to pre-process the input files and splunk would take the script output as its input.

0 Karma

charltones
Explorer

I'm not sure that the second part is do-able with a regex. Taking a step back though, have you looked at the spath command (which parses JSON from parts of events). I did this:

| stats count | fields - count | eval jsonpart="{\"actionId\":47533796,\"clientIP\":\"127.0.0.1\",\"actions\":[\"Action1\",\"Action2\",\"Action3\"],\"user\":\"alia\"}" | spath input=jsonpart

where I mimic an event where one field contains the json. spath then does a great job of parsing the json bit. The array comes out as a multi valued field, the other fields are plain fields in the same event. Perhaps that will help?

jayannah
Builder

Hi charltones

Thanks for the response. Yes, I'm very well aware of spath usage and I have implement the above problem in manual search query. My question is related to config for auto extraction of the JSON portion in the transforms.conf, which you have responded earlier. Thanks again.

0 Karma

charltones
Explorer

Try this regex:

"([\w]+)":(\[[^\]]+\]|[^,}]+)

It optionally matches a:[b,c,d] or a:b

jayannah
Builder

Thanks. 1st question is solved.

Any idea on 2nd question?

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...