Splunk Search

Field extraction regex conditional if-then-else

cdstealer
Contributor

Here are 2 events from an apache log. I have a field extraction regex which works unless the content-type contains a "charset" field.

[01/Aug/2014:07:43:48 +0100] 1150 xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx IT 200 GET www.URL.com/images/favicon.ico - "Mozilla/5.0 (Linux; U; Android 4.4.2; en-gb; HTC_Desire_610 Build/KOT49H) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30" text/plain; charset=UTF-8 2148

[01/Aug/2014:07:43:58 +0100] 293 xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx - 200 GET www.URL.com/robots.txt - "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +go.mail.ru/help/robots)" text/javascript 118943

The regex that works on the second event is:

(?i)^[^\+]*\+\d+\]\s+(?P<bytes>[^ ]+)\s(?P<clientip>[^ ]+)\s(?P<xforward_ip>[^ ]+)\s(?P<cluster_ip>[^ ]+)\s(?P<lang>[^ ]+)\s(?P<response>[^ ]+)\s(?P<method>[^ ]+)\s(?P<uri>[^ ]+)\s(?P<referer>[^ ]+)\s"(?P<useragent>[^"]*?)"\s(?P<mime_type>[^ ]+)\s(?P<response_time>[^ ]+)

So what I’m trying to do is have the regex match "text/plain" but if it sees "; charset=UTF-8" to also match that in the same group.

So my attempt at the regex is:

(?i)^[^\\+]*\\+\\d+\\]\\s+(?P<bytes>[^ ]+)\\s(?P<clientip>[^ ]+)\\s(?P<xforward_ip>[^ ]+)\\s(?P<cluster_ip>[^ ]+)\\s(?P<lang>[^ ]+)\\s(?P<response>[^ ]+)\\s(?P<method>[^ ]+)\\s(?P<uri>[^ ]+)\\s(?P<referer>[^ ]+)\\s\"(?P<useragent>[^\"]*?)\"\\s?(\\w+\\W\\w+\\W\\s\\S+?)(?P<mime_type>[^;]+)|(?P<mime_type>[^ ]+)\\s(?P<response_time>[^ ]+)

The if-then-else statement is ?(\\w+\\W\\w+\\W\\s\\S+?)(?P<mime_type>[^;]+)|(?P<mime_type>[^ ]+) but splunk gives the error "Regex: two named subpatterns have the same name", which I understand.

Unfortunately I'm a regex noob, so this is my understanding...

?(\\w+\\W\\w+\\W\\s\\S+?) = if(condition)

(?P<mime_type>[^;]+) = then field is

|(?P<mime_type>[^ ]+) = else match

Hope that makes sense 🙂

1 Solution

somesoni2
Revered Legend

Give this a try

(?i)^[^\+]*\+\d+\]\s+(?P<bytes>[^ ]+)\s(?P<clientip>[^ ]+)\s(?P<xforward_ip>[^ ]+)\s(?P<cluster_ip>[^ ]+)\s(?P<lang>[^ ]+)\s(?P<response>[^ ]+)\s(?P<method>[^ ]+)\s(?P<uri>[^ ]+)\s(?P<referer>[^ ]+)\s\"(?P<useragent>[^\"]*?)\"\s(?P<mime_type>(\w+\/\w+))(.*)\s(?P<response_time>\d+)

View solution in original post

somesoni2
Revered Legend

Give this a try

(?i)^[^\+]*\+\d+\]\s+(?P<bytes>[^ ]+)\s(?P<clientip>[^ ]+)\s(?P<xforward_ip>[^ ]+)\s(?P<cluster_ip>[^ ]+)\s(?P<lang>[^ ]+)\s(?P<response>[^ ]+)\s(?P<method>[^ ]+)\s(?P<uri>[^ ]+)\s(?P<referer>[^ ]+)\s\"(?P<useragent>[^\"]*?)\"\s(?P<mime_type>(\w+\/\w+))(.*)\s(?P<response_time>\d+)

cdstealer
Contributor

perfect! Thank you very much 🙂 Looks like I was over engineering it.

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...