I have a data feed with CEF format. Splunk picks up the key value pairs except the value with the whitespaces, for instance, "subject=my testing" from the sample log below, Splunk only extracts "my" from "subject". I can create a custom regex, such as "src=(?P[^\s]+)\sdst=(?P[^\s]+)\sspt=(?P[^\s]+)\ssubject=(?P.+)".
Sep 19 08:26:10 host CEF:0|ESM|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232 subject=my testing
Is there an easy way to fix this issue without creating a custom regex? Thanks.
Hi there @splunkrocks2014
Try like this.
Add this to your props.conf
REPORT-cefxtractions = cefheaders,cefvaluekeys
Add this to your transforms.conf
[cefheaders]
REGEX = CEF:\s(?<cef_version>\d+)\|(?<cef_vendor>[^|]*)\|(?<cef_product>[^|]*)\|(?<cef_prodversion>[^|]*)\|(?<cef_ruleid>[^|]*)\|(?<cef_rulename>[^|]*)\|(?<cef_severity>[^|]*)
[cefvaluekeys]
REGEX = (?:_+)?(?<_KEY_1>[\w.:\[\]]+)=(?<_VAL_1>.*?(?=(?:\s[\w.:\[\]]+=|$)))
REPEAT_MATCH = True
CLEAN_KEYS = 1
Hope it helps.
Try this as a Field Extraction:
\b(c(?>6a|fp|n|s)\d+)Label=(?<_KEY_1>[^=]+)(?=\s+\w+=).*?\1=(?<_VAL_1>[^=]+)(?=\s+\w+=)
The _VAL_1
and _KEY_1
field names are very special.
If you can change the format of the log file to have quotes around the values, then it can be fixed automatically in Splunk.
If you can't change the format of the log, then probably not. In that case you will have to do it with a regular expression using custom field extraction. It would be helpful to know which of the fields might have a space within the value field. There aren't any keys with spaces in the names, are there?
In the case above where it is the last field that is quite easy to do the field extraction. It the regular expression would be something like:
subject=(?P<subject>.*)$
because it comes at the end of the line. Others will be more difficult, but can be done.
If you can answer the above questions about the data, then a more definitive answer can be provided.
Hi cpetterborg, thank you very much for the quick responses. There are two issues from the events we are collecting: 1) Source is unable to put the double quotes to the value 2) the whitespace can be in any values
If your data is going to be delivered into the log in the same order, then you can go with a regex like the following:
src=(?P<src>.*?)\s+dst=(?P<dst>.*?)\s+spt=(?P<spt>.*?)\s+subject=(?P<subject>.*)$
But if it can't be relied upon to be in that order, and no additional fields mixed in, then it becomes much more difficult, perhaps not possible. If you can depend on order of fields, though, the task is much simpler (as above).