Splunk Search

Extract username and domain from field order depends on delimiter

solarboyz1
Builder

We have a Username field which we are extracting via search time rex.

| rex field=_raw "User (?\S+)"

The Username field will contain the UserID and Domain. However, we are in the middle of transitioning Authentication systems so how the username is split depends on the which authentication system is used. The User_Name field will be in one of two different formats:

DomainName\UserID
first.last@DomainName

Before the transition, I had been using:
rex field= UserName "(?\w+)\W+(?\w+)"

For the new values this is the rex I've got working:
rex field= UserName "(?[\w.]+)\@(?\w+)"

I was wondering what the best method to extract the value based on the format? Using an IF statement or Case?

Any tips appreciated.

Tags (1)
0 Karma

Rob
Splunk Employee
Splunk Employee

Have you considered using alternation to match both possibilities with rex?
For example, something like the following

rex field= UserName "(?<##domain>\w+)\W+(?<##userid>\w+)|(?<##userid>[\w.]+)@(?<##domain>\w+)"

(Please remove the ## from the above regex, its my workaround to the Splunk Answers auto formatting)

bmacias84
Champion

I believe this what I was stating about having identically named capture group. I would creat a field extraction for both and append netbios_ or fqdn_. Then using an auto-lookup table to normalize the data.

0 Karma

solarboyz1
Builder

That's not working for me. Get the following error:

Error in 'rex' command: Encountered the following error while compiling the regex '(?\w+)\W+(?\w+)|(?[\w.]+)@(?\w+)': Regex: two named subpatterns have the same name

0 Karma

Rob
Splunk Employee
Splunk Employee

@bmacias84: The alternation should take care of that. It basically says "Try and match and remember what's on the left side of the pipe character. If that fails, try and match and remember what's on the right side of the pipe character."
It's not overloading the regex capture group and therefore PCRE and Python should be happy with it.

0 Karma

Rob
Splunk Employee
Splunk Employee

@jonuwz:
Thanks, I figured that would be a lot simpler than trying to use a complicated eval with if, case, and match functions. 🙂

0 Karma

bmacias84
Champion

You may encounter errors with that that regex. In general PCRE and Python do not identicly named capture groups.

0 Karma

jonuwz
Influencer

Nice. My brain wouldn't accept that capture groups would work with alternatives.

0 Karma

jonuwz
Influencer

What non-word characters split the domain and user in the old format ?

If you can devise a regex that'll split the parts of the account (with generic names)

rex field= UserName "(?<part1>[\w.]+)(?<splitter>[^\w.]+)(?<part2>\w+)"

then you can just do

... | eval domain=if(splitter=="@",part2,part1) | eval userid=if(splitter=="@",part1,part2) | ...
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...