Hi,
I have a field in my existing data set called mso. Within that field are company names
Example
CompanyA
CompanyAA1
CompanyABB1
CoB
CoBBBB1
CompC
CompC2
What I would like to do is create a new field called normalized_mso with normalized company names. so that it looks like the below
CompanyA
CoB
CompC
How could I accomplish this?
This works against your examples.
| makeresults
| eval mso="ExxonUSA,ExxonEurope,ExxonAPAC,ShellUSA,ShellSouthAmerica,ShellNZ,ChevronCalifornia,ChevronIceland,Chevron"
| makemv mso delim=","
| mvexpand mso
| eval mso2=mso
| rex mode=sed field=mso2 "s/(^[A-Z]+?[a-z]+)([A-Z]*?[a-z]*?[a-zA-Z]*?$)/\1/g"
You may need to post any examples you have of companies that have numbers, special characters, or all lower case for testing. This works against a few more scenarios...
| makeresults
| eval mso="ExxonUSA,ExxonEurope,ExxonAPAC,ShellUSA,ShellSouthAmerica,ShellNZ,ChevronCalifornia,ChevronIceland,Chevron,blahblahTheblah,Protection1Green,327weirdness Inc,Iceberg Titanic Company"
| makemv mso delim=","
| mvexpand mso
| eval mso2=mso
| rex mode=sed field=mso2 "s/^([^a-z]*)([a-z0-9]+)([^a-z0-9\n]*)([a-zA-Z0-9 ]*)$/\1\2/g"
@dbcase, if you can use the upper case character in second string as the pattern to end field extraction, you can use the rex command | rex field=<YourFieldNameGoesHere> "(?<mso_new>[A-Z]{1}[^A-Z]+)"
.
Following is a run anywhere search for you to try out:
| makeresults
| eval mso="ExxonUSA,ExxonEurope,ExxonAPAC,ShellUSA,ShellSouthAmerica,ShellNZ,ChevronCalifornia,ChevronIceland,Chevron"
| makemv mso delim=","
| mvexpand mso
| rex field=mso "(?<mso_new>[A-Z]{1}[^A-Z]+)"
PS: The regular expression means start with Capital letter and extract all characters before find next Upper Case character. You can test the same on regex101.com as well.
Hi Niketnilay,
You are a busy man!!! Sadly the real data doesn't necessarily have an uppercase character.
@dbcase 😄 LOL I am on leave today, travelling to hometown 🙂
If there is no pattern your option would be to use lookup with wildcard. Refer to this answer https://answers.splunk.com/answers/52580/can-we-use-wildcard-characters-in-a-lookup-table.html
You need to give us the rules for equation. For example:
All names start with and end with a single capital letter with all-lowercase between.
Everything after the 2nd (ending) capital letter is trash.
Well the names are kinda all over the map, thats why I made the examples variable.
Right now I'm matching using "like" and while it mostly works, what I really need is a new field in the existing data set.
Here is a better example
Company names (made up)
ExxonUSA
ExxonEurope
ExxonAPAC
ShellUSA
ShellSouthAmerica
ShellNZ
ChevronCalifornia
ChevronIceland
Chevron
I'd like to end up with
Exxon
Shell
Chevron
I get the general idea but I need some rules of structure against which to code.