Run the equivalent of an `extract` command on a st...

ckarcher · ‎09-03-2019

We're ingesting structured JSON logs from a source and would like to run the equivalent of the extract command on one of the event's sub fields. The events look something like this:

{
    "field1":"value1",
    "field2":"value2",
    "field3":"value3",
    "msg":"field4=value4 field5=value5 field6=value6"
}

The top level field1/field2/field3/msg fields are all being extracted as expected. However, we'd also like to extract arbitrary key/value pairs defined in the msg field, ideally at index time so that they're available to all searches. The key/value pairs that exist in the msg field are not known beforehand. Is it possible to still extract them at index time and make them available to searches?

We've been able to achieve the desired result with a search command chain like the following:

...base search...
| rename _raw AS _temp 
| rename msg AS _raw 
| extract pairdelim="?&" kvdelim="=" 
| rename _raw AS msg 
| rename _temp AS _raw

However, we have some dashboards that run lots of searches, and we don't want to hack the above command chain into every individual search query.

ckarcher · ‎09-05-2019

I was able to solve this by creating two field transforms like the following that handle the case where the values are in quotes (e.g., key1="value1 with spaces") as well as the case where they aren't (e.g., key1=value1withoutspaces).

json_msg_transform_with_quotes
(?P<_KEY_1>\w+)="(?P<_VAL_1>[^"]*)"

json_msg_transform_without_quotes
(?P<_KEY_1>\w+)=(?P<_VAL_1>[^"\s]+)

I then wired up two new field extractions that use those transforms on the desired source type, and I'm now seeing all the fields (both those from the raw JSON event as well as those embedded in the msg field) available at query time.

kamlesh_vaghela · ‎09-05-2019

@ckarcher,

Can you please try by adding below configurations in props.conf?

File path: SPLUNK_HOME/etc/apps/YOUR_APP/local/props.conf

[YOUR_SOURCETYPE]
EXTRACT-field4,field5,field6 = ^[^=\n]*=(?P<field4>\w+)[^=\n]*=(?P<field5>\w+)[^=\n]*=(?P<field6>\w+)

Note: You may need to update the regular expression as per your events/requirement.

Thanks

ckarcher · ‎09-05-2019

Per the original post, the names of the key/value pairs in the msg field are arbitrary and unknown beforehand.

kamlesh_vaghela · ‎09-04-2019

@ckarcher,

You can try this also:

| makeresults | eval _raw="{\"field1\":\"value1\",\"field2\":\"value2\",\"field3\":\"value3\",\"msg\":\"field4=value4 field5=value5 field6=value6\"}" | extract | eval _raw=msg | extract

ckarcher · ‎09-04-2019

Hi @kamlesh_vaghela - we've already proven that it's possible to extract the K/V pairs from msg at search time with an extract command like you've provided. However, we have dashboards with lots of searches in them, and we want to avoid hacking the rename + extract command into each of them. Do you know if it's possible to do this in a way that works for all searches against a given source type?

kamlesh_vaghela · ‎09-05-2019

@ckarcher,

please check my below answer.

poete · ‎09-04-2019

Hello @ckarcher,
In case the format of msg does not change, you can use rex, as below

| makeresults 
| eval _raw="{\"field1\":\"value1\",\"field2\":\"value2\",\"field3\":\"value3\",\"msg\":\"field4=value4 field5=value5 field6=value6\"}"
| spath
| rex field=msg "field4=(?<field4>.*) field5=(?<field5>.*) field6=(?<field6>.*)"

ckarcher · ‎09-04-2019

Hi @poete - the format of the msg field is unknown beforehand. It may contain any number of arbitrary key/value pairs, and we want to extract them all. I've updated the question to reflect this.

Run the equivalent of an `extract` command on a structured JSON event's subfield

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!