Extract username and domain from field order depen...

solarboyz1 · ‎11-16-2012

We have a Username field which we are extracting via search time rex.

| rex field=_raw "User (?\S+)"

The Username field will contain the UserID and Domain. However, we are in the middle of transitioning Authentication systems so how the username is split depends on the which authentication system is used. The User_Name field will be in one of two different formats:

DomainName\UserID
first.last@DomainName

Before the transition, I had been using:
rex field= UserName "(?\w+)\W+(?\w+)"

For the new values this is the rex I've got working:
rex field= UserName "(?[\w.]+)\@(?\w+)"

I was wondering what the best method to extract the value based on the format? Using an IF statement or Case?

Any tips appreciated.

Rob · ‎11-16-2012

Have you considered using alternation to match both possibilities with rex?
For example, something like the following

rex field= UserName "(?<##domain>\w+)\W+(?<##userid>\w+)|(?<##userid>[\w.]+)@(?<##domain>\w+)"

(Please remove the ## from the above regex, its my workaround to the Splunk Answers auto formatting)

bmacias84 · ‎11-19-2012

I believe this what I was stating about having identically named capture group. I would creat a field extraction for both and append netbios_ or fqdn_. Then using an auto-lookup table to normalize the data.

solarboyz1 · ‎11-19-2012

That's not working for me. Get the following error:

Error in 'rex' command: Encountered the following error while compiling the regex '(?\w+)\W+(?\w+)|(?[\w.]+)@(?\w+)': Regex: two named subpatterns have the same name

Rob · ‎11-16-2012

@bmacias84: The alternation should take care of that. It basically says "Try and match and remember what's on the left side of the pipe character. If that fails, try and match and remember what's on the right side of the pipe character."
It's not overloading the regex capture group and therefore PCRE and Python should be happy with it.

Rob · ‎11-16-2012

@jonuwz:
Thanks, I figured that would be a lot simpler than trying to use a complicated eval with if, case, and match functions. 🙂

bmacias84 · ‎11-16-2012

You may encounter errors with that that regex. In general PCRE and Python do not identicly named capture groups.

jonuwz · ‎11-16-2012

Nice. My brain wouldn't accept that capture groups would work with alternatives.

jonuwz · ‎11-16-2012

What non-word characters split the domain and user in the old format ?

If you can devise a regex that'll split the parts of the account (with generic names)

rex field= UserName "(?<part1>[\w.]+)(?<splitter>[^\w.]+)(?<part2>\w+)"

then you can just do

... | eval domain=if(splitter=="@",part2,part1) | eval userid=if(splitter=="@",part1,part2) | ...

Extract username and domain from field order depends on delimiter

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms