Splunk Search

How to create a regex to extract key value pairs

splunkrocks2014
Communicator

I have a data feed with CEF format. Splunk picks up the key value pairs except the value with the whitespaces, for instance, "subject=my testing" from the sample log below, Splunk only extracts "my" from "subject". I can create a custom regex, such as "src=(?P[^\s]+)\sdst=(?P[^\s]+)\sspt=(?P[^\s]+)\ssubject=(?P.+)".

Sep 19 08:26:10 host CEF:0|ESM|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232 subject=my testing

Is there an easy way to fix this issue without creating a custom regex? Thanks.

0 Karma

alemarzu
Motivator

Hi there @splunkrocks2014

Try like this.

Add this to your props.conf

REPORT-cefxtractions = cefheaders,cefvaluekeys

Add this to your transforms.conf

[cefheaders]
REGEX = CEF:\s(?<cef_version>\d+)\|(?<cef_vendor>[^|]*)\|(?<cef_product>[^|]*)\|(?<cef_prodversion>[^|]*)\|(?<cef_ruleid>[^|]*)\|(?<cef_rulename>[^|]*)\|(?<cef_severity>[^|]*)

[cefvaluekeys]
REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))
REPEAT_MATCH = True
CLEAN_KEYS = 1

Hope it helps.

Tags (1)

woodcock
Esteemed Legend

Try this as a Field Extraction:

\b(c(?>6a|fp|n|s)\d+)Label=(?<_KEY_1>[^=]+)(?=\s+\w+=).*?\1=(?<_VAL_1>[^=]+)(?=\s+\w+=)

The _VAL_1 and _KEY_1 field names are very special.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If you can change the format of the log file to have quotes around the values, then it can be fixed automatically in Splunk.

If you can't change the format of the log, then probably not. In that case you will have to do it with a regular expression using custom field extraction. It would be helpful to know which of the fields might have a space within the value field. There aren't any keys with spaces in the names, are there?

In the case above where it is the last field that is quite easy to do the field extraction. It the regular expression would be something like:

subject=(?P<subject>.*)$

because it comes at the end of the line. Others will be more difficult, but can be done.

If you can answer the above questions about the data, then a more definitive answer can be provided.

0 Karma

splunkrocks2014
Communicator

Hi cpetterborg, thank you very much for the quick responses. There are two issues from the events we are collecting: 1) Source is unable to put the double quotes to the value 2) the whitespace can be in any values

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If your data is going to be delivered into the log in the same order, then you can go with a regex like the following:

src=(?P<src>.*?)\s+dst=(?P<dst>.*?)\s+spt=(?P<spt>.*?)\s+subject=(?P<subject>.*)$

But if it can't be relied upon to be in that order, and no additional fields mixed in, then it becomes much more difficult, perhaps not possible. If you can depend on order of fields, though, the task is much simpler (as above).

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...