All Apps and Add-ons

Splunk Field Extraction Error with Edited Regex

adumbrys
Explorer

I have logs which contain a long series of pipe delimited fields.

My issue is that there are some fields which do not have any values, and instead of some character being loaded in place of a NULL field, the field is left blank.

For example this log would record various information about site visitors, some fields are left blank based on the device and parts of the website visited.

|wired|||||/||00005wcuu-jSbW_AypQB1ZDLdjH:180ds1m45|Search|||Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36|

But when I extrapolate the regex to meet this scenario, I still receive "the generated regex was unable to match all examples"

Here is a regex created in attempt to generate fields:

^(?P<FIELDNAME1>[^\|]+)\|(?P<FIELDNAME2>[^\|]+)\|(?P<FIELDNAME3>[^\|]+)\|(?P<FIELDNAME4>[^\|]+)\|(?P<FIELDNAME5>[^\|]+)\|(?P<FIELDNAME6>[^\|]+)\|(?P<FIELDNAME7>[^\|]+)\|(?P<FIELDNAME8>[^\|]+)\|(?P<FIELDNAME9>[^\|]+)\|(?P<FIELDNAME10>[^\|]+)\|(?P<FIELDNAME11>[^\|]+)\|(?P<FIELDNAME12>[^\|]+)

None of the log events will contain Pipes within the fields, so I thought that it would be simple enough to tell Splunk that anything (even nothing) between two pipes is a field.

Any suggestions are greatly appreciated!

0 Karma

kristian_kolb
Ultra Champion

One problem lies in the fact that you use the one-or-more quantifier (the plus sign) for your non-pipe character classes. Use the zero-or-more quantifier (the asterisk) or perhaps better still - don't use regex at all.

Have you looked into REPORT instead of EXTRACT? This allows you to make your extractions by specifying a delimiter, e.g.

props.conf

[your_sourcetype]
REPORT-www = extract_weblog_fields

transforms.conf

[extract_weblog_fields]
DELIMS = "|"
FIELDS = field1, field2, field3, field4

Just name the fields appropriately. Read more on REPORT and FIELD EXTRACTION in the docs.

/K

adumbrys
Explorer

Can I do this when I don't have Splunk installed locally, and just accessing it through the browser?

I definitely need to do more reading on this, because I would suppose that I could create config files and submit them to my administrator?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

By using '+' in your field descriptions you're telling regex there must be at least one character between pipes. Try using '*'.

---
If this reply helps you, Karma would be appreciated.

adumbrys
Explorer

Thank you! This definitely caught the majority of the delimited fields. Now I just need to make sure each event in the logs is in the same exact format.

I'm getting some little errors, where one pair of pipes isn't being caught properly...and the user agent string in the last field is cut off after the first few letters.

But most importantly I'm not getting the same problems as before...which is progress!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...