Splunk Search

How to edit my single regex for parsing multiple types of events in the same sourcetype?

pgadhari
Builder

Hi All,

I want a single regex for multiple types of events getting generated in my access logs. I have written the following regex for extracting fields from my access.log :

^(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^ ]+)\s+\[(?P[^\]]+)[^ \n]*\s+url="(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^"]+)"\s+\|status=(?P[^ ]+)\s+\|size=(?P[^ ]+)\s+\|resp_time=(?P[^ ]+)\|\sreferer="(?P[^"]+)"\suser_agent="(?P[^"]+)"

The problem is most of the events are getting matched to this regex, but there are 4 events which are showing as "non-matches". Both of my events are :

Matching :

10.0.0.76 - - [20/Apr/2016:16:41:50 +0000] url="GET /dh/en-US/account/login HTTP/1.1" |status=302 |size=323 |resp_time=176| referer="-" user_agent="Python-httplib2/0.7.0 (gzip)"

Non matching is :

37.28.152.58 - - [19/Apr/2016:20:51:58 +0000] url="myversion|3.6 Public" |status=400 |size=312 |resp_time=125| referer="-" user_agent="-" 

106.184.4.52 - - [17/Apr/2016:18:19:27 +0000] url="SSH-2.0-LYGhost_1.2.7-20100630" |status=302 |size=299 |resp_time=129| referer="-" user_agent="-"

In the above non-matching event - fields http_method & protocol are not there. Is there a way to write some conditions in the regex so that using the above regex should work with both the events? Please help.

Thanks
PG

0 Karma

lguinn2
Legend

You can have multiple field extractions for the same sourcetype, no problem. For each event, Splunk will attempt to apply all the regular expressions, and will use all of them that match.

Also, regular expressions in Splunk are unanchored.

Finally if you really want to use such a complex regular expression, I suggest that you use a regular expression tool to test it thoroughly.

You might also want to read the manual entries for creating and maintaining search time field etxractions.

lguinn2
Legend

Why must this be a single regex?

This is hard to understand, and that makes it fragile and hard to maintain - not to mention hard to get right in the first place!

0 Karma

pgadhari
Builder

So how this can be achieved. Can you please guide me. Actually both the events are from the same sourcetype. So how can we write different regex for different events ? and apply it ?

0 Karma

sundareshr
Legend

It will also make it hard to troubleshoot.

0 Karma

pgadhari
Builder

somehow the regex I copied above is not showing capture group name. Pasting the regex again :

^(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^ ]+)\s+\[(?P[^\]]+)[^ \n]*\s+url="(?P[^ ]+)\s+(?P[^ ]+)\s+(?P[^"]+)"\s+\|status=(?P[^ ]+)\s+\|size=(?P[^ ]+)\s+\|resp_time=(?P[^ ]+)\|\sreferer="(?P[^"]+)"\suser_agent="(?P[^"]+)"\siPlanetDirectoryPro="(?P[^"]+)"
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...