Splunk Search

Regex question extracting user from webserver log

mikelanghorst
Motivator

For this sample data:
172.21.174.78 - "/dc=com/dc=caiso/OU=people/CN=Bob User" [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - mlanghor [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - - [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"

For some of our webserver logs, we are logging the DN from the user certificate with %{SSL_CLIENT_S_DN}x.

The default extraction for user is [[nspaces:user], so essentially (?[^\s]+).

In trying to extract the different variations for the user field I came up with:

(?<user>([^\"\s]+|\"[^\"]+\"))
But that includes the " as part of the field. I'm haven't been able to come up with a regex that"
when the first character is a " grab everything but not including the "'s, otherwise, grab everything till the next space.

Tags (1)

danielschroeder
Engager

You need to work with lookbehinds.

(?<user>(?<=\")[^\"]+|(?<!\")[^\s\"]+)

0 Karma

kristian_kolb
Ultra Champion

Would this work? Unescape the double quotes if needed.

^\S+\s+\S+\s+\"?(?<user>(?:([^\"]+)\"\s|([\S]+)\s+))

UPDATE:

Played around a little more with RegExr, and this looks good in there anyway (capture group 1 is OK).

^\S+\s+\S+\s+\"?(?<user>(?:(([^\"]+))|([\S]+)\s+))(?:\"\s\[|\s\[)

Wondering if it works,

/Kristian

0 Karma

mikelanghorst
Motivator

Seems closer, but it's retaining the closing quote.

0 Karma

mikelanghorst
Motivator

Finally got one working as I want:

(?:\"(?[^\"]+)\"|(?[^\s]+))

Or not, RegExr and Expresso works ok with this, but Splunk Rex command fails due to multiple blocks.

mikelanghorst
Motivator

while regexr accepts it just fine, passing this to rex fails with:
Error in 'rex' command: Encountered the following error while compiling the regex '(?:(?:"(?[^"]+)")|(?[^\s]+))': Regex: two named subpatterns have the same name

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...