For reasons I can't explain, our SiteMinder-protected web site is logging user in two different formats, one that just has the simple user name, and another that has the domain prefix; so for a given user, we have web server access logs that contain both "myname" and "MyDom//MyName".
My goal is to "normalize" these so that when I perform stats against user, I get both of these variants aggregated into one count for the one user.
Might seem like a job for a simple RegEx, BUT... notice that the cases are different! "myname" gets logged in lower case, but "MyDom//MyName" gets logged in mixed case. Through RegEx and use of Upper(), I've been able to get the two variants to display the same in a report... but they are still getting reported distinctly with separate counts. I tried to dedup based on the "normalized" value, but then it only returned one of the two variants, with only the counts for that variant (not both of them aggregated.)
Any ideas?
What you need is a calculated field which will normalize the all variations, format and case, for user field.
See this :- http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/definecalcfields
e.g.
Props.conf on Search Head (assuming already existing field name is 'User')
[YourSourceType]
EVAL-User = mvindex(split(upper(User),"/"),-1)
What you need is a calculated field which will normalize the all variations, format and case, for user field.
See this :- http://docs.splunk.com/Documentation/Splunk/6.2.0/Knowledge/definecalcfields
e.g.
Props.conf on Search Head (assuming already existing field name is 'User')
[YourSourceType]
EVAL-User = mvindex(split(upper(User),"/"),-1)
Thanks, somesoni2. Due to the federated roles our large org has regarding Splunk, I am essentially limited to creating searches and stuff within my own app, and can't tweak any of the .conf files. (I do understand why that would be a more strategic method, and maybe sometime I'll navigate the Change Process to make it happen!) But I was able to apply your basic solution within a search, so that's great. Thanks!
Actually, this worked...
| eval user_clean=mvindex(split(upper(user),"//"),-1)
And then I can do stats based on "user_clean" and don't even need to dedup.
I just made it part of the search, so that I don't need to change any .conf files. (I don't have the necessary permissions.)
THANKS!
Actually, you don't need to edit config files to do this. In the Splunk UI, under Settings->Fields you will see an entry for "Calculated fields". If you create the same definition that you put into your search there, assign it to your sourcetype in question and share it globally, the field will be automatically calculated whenever you search for that sourcetype.
Have you considered field aliases?
http://docs.splunk.com/Documentation/Splunk/6.2.1/Knowledge/Addaliasestofields
Also here:
http://answers.splunk.com/answers/110019/using-field-aliases.html
With this, you could create a new custom "combined_user" that you could run reports on with the same counts.
Thanks, gwalford. But as a newbie, I couldn't figure out how this would help; seems like just giving it a new name but I was still tripped up about modifying the value.