We have a Username field which we are extracting via search time rex.
| rex field=_raw "User (?
The Username field will contain the UserID and Domain. However, we are in the middle of transitioning Authentication systems so how the username is split depends on the which authentication system is used. The User_Name field will be in one of two different formats:
DomainName\UserID
first.last@DomainName
Before the transition, I had been using:
rex field= UserName "(?
For the new values this is the rex I've got working:
rex field= UserName "(?
I was wondering what the best method to extract the value based on the format? Using an IF statement or Case?
Any tips appreciated.
Have you considered using alternation to match both possibilities with rex?
For example, something like the following
rex field= UserName "(?<##domain>\w+)\W+(?<##userid>\w+)|(?<##userid>[\w.]+)@(?<##domain>\w+)"
(Please remove the ## from the above regex, its my workaround to the Splunk Answers auto formatting)
I believe this what I was stating about having identically named capture group. I would creat a field extraction for both and append netbios_ or fqdn_. Then using an auto-lookup table to normalize the data.
That's not working for me. Get the following error:
Error in 'rex' command: Encountered the following error while compiling the regex '(?
@bmacias84: The alternation should take care of that. It basically says "Try and match and remember what's on the left side of the pipe character. If that fails, try and match and remember what's on the right side of the pipe character."
It's not overloading the regex capture group and therefore PCRE and Python should be happy with it.
@jonuwz:
Thanks, I figured that would be a lot simpler than trying to use a complicated eval with if, case, and match functions. 🙂
You may encounter errors with that that regex. In general PCRE and Python do not identicly named capture groups.
Nice. My brain wouldn't accept that capture groups would work with alternatives.
What non-word characters split the domain and user in the old format ?
If you can devise a regex that'll split the parts of the account (with generic names)
rex field= UserName "(?<part1>[\w.]+)(?<splitter>[^\w.]+)(?<part2>\w+)"
then you can just do
... | eval domain=if(splitter=="@",part2,part1) | eval userid=if(splitter=="@",part1,part2) | ...