Splunk Search

How to edit my single regex for parsing multiple types of events in the same sourcetype?

pgadhari
Builder

Hi All,

I want a single regex for multiple types of events getting generated in my access logs. I have written the following regex for extracting fields from my access.log :

^(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^ ]+)\s+\[(?P[^\]]+)[^ \n]*\s+url="(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^"]+)"\s+\|status=(?P[^ ]+)\s+\|size=(?P[^ ]+)\s+\|resp_time=(?P[^ ]+)\|\sreferer="(?P[^"]+)"\suser_agent="(?P[^"]+)"

The problem is most of the events are getting matched to this regex, but there are 4 events which are showing as "non-matches". Both of my events are :

Matching :

10.0.0.76 - - [20/Apr/2016:16:41:50 +0000] url="GET /dh/en-US/account/login HTTP/1.1" |status=302 |size=323 |resp_time=176| referer="-" user_agent="Python-httplib2/0.7.0 (gzip)"

Non matching is :

37.28.152.58 - - [19/Apr/2016:20:51:58 +0000] url="myversion|3.6 Public" |status=400 |size=312 |resp_time=125| referer="-" user_agent="-" 

106.184.4.52 - - [17/Apr/2016:18:19:27 +0000] url="SSH-2.0-LYGhost_1.2.7-20100630" |status=302 |size=299 |resp_time=129| referer="-" user_agent="-"

In the above non-matching event - fields http_method & protocol are not there. Is there a way to write some conditions in the regex so that using the above regex should work with both the events? Please help.

Thanks
PG

0 Karma

lguinn2
Legend

You can have multiple field extractions for the same sourcetype, no problem. For each event, Splunk will attempt to apply all the regular expressions, and will use all of them that match.

Also, regular expressions in Splunk are unanchored.

Finally if you really want to use such a complex regular expression, I suggest that you use a regular expression tool to test it thoroughly.

You might also want to read the manual entries for creating and maintaining search time field etxractions.

lguinn2
Legend

Why must this be a single regex?

This is hard to understand, and that makes it fragile and hard to maintain - not to mention hard to get right in the first place!

0 Karma

pgadhari
Builder

So how this can be achieved. Can you please guide me. Actually both the events are from the same sourcetype. So how can we write different regex for different events ? and apply it ?

0 Karma

sundareshr
Legend

It will also make it hard to troubleshoot.

0 Karma

pgadhari
Builder

somehow the regex I copied above is not showing capture group name. Pasting the regex again :

^(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^ ]+)\s+\[(?P[^\]]+)[^ \n]*\s+url="(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^"]+)"\s+\|status=(?P[^ ]+)\s+\|size=(?P[^ ]+)\s+\|resp_time=(?P[^ ]+)\|\sreferer="(?P[^"]+)"\suser_agent="(?P[^"]+)"\siPlanetDirectoryPro="(?P[^"]+)"
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...