Getting Data In

JSON field extraction when portion of the log is JSON format and some values are type array with comma delimeter

jayannah
Builder

Hi All

I have 2 questions.

Question-1:
I wanted to write auto extract KVs from JSON portion of the events.
Here is the JSON portion of sample event --> "actionId":47533796,"clientIP":"127.0.0.1","actions":["Action1","Action2","Action3"],"user":"alia",...

I wrote regex --> \"([\w]+)\":([^,}]+)

Here, the complexity is value type is sometime array of values with comma separator with variable number of members.
So for the key "actions", the above regex gives the values as ["Action1",
Since "," is also delimiter for separating JSON KVs, the array value also uses "," as delimiter, hence the problem.
how can I auto extract all the KVs from JSON string for the above case? I want to put in transforms.conf for auto extraction of all the KVs from JSON string.......

Question-2:
If I want to replace the "," only for array value using SED command, how can I do it?
E,g: I want to replace comma delimiter with semi-colon only for the values of array type[a,brrrr,c2,d,ehh,.....,w] with [a;brrrr;c2;d;ehh;....;w]

Tags (2)
1 Solution

charltones
Explorer

Try this regex:

"([\w]+)":(\[[^\]]+\]|[^,}]+)

It optionally matches a:[b,c,d] or a:b

View solution in original post

charltones
Explorer

Ah understood, in that case maybe a scripted input would be a better option?
http://docs.splunk.com/Documentation/Splunk/6.0/Data/Setupcustominputs

You could write some python (or other script) to pre-process the input files and splunk would take the script output as its input.

0 Karma

charltones
Explorer

I'm not sure that the second part is do-able with a regex. Taking a step back though, have you looked at the spath command (which parses JSON from parts of events). I did this:

| stats count | fields - count | eval jsonpart="{\"actionId\":47533796,\"clientIP\":\"127.0.0.1\",\"actions\":[\"Action1\",\"Action2\",\"Action3\"],\"user\":\"alia\"}" | spath input=jsonpart

where I mimic an event where one field contains the json. spath then does a great job of parsing the json bit. The array comes out as a multi valued field, the other fields are plain fields in the same event. Perhaps that will help?

jayannah
Builder

Hi charltones

Thanks for the response. Yes, I'm very well aware of spath usage and I have implement the above problem in manual search query. My question is related to config for auto extraction of the JSON portion in the transforms.conf, which you have responded earlier. Thanks again.

0 Karma

charltones
Explorer

Try this regex:

"([\w]+)":(\[[^\]]+\]|[^,}]+)

It optionally matches a:[b,c,d] or a:b

jayannah
Builder

Thanks. 1st question is solved.

Any idea on 2nd question?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...