Solved: How can I use regex to match only certain parts of...

waeleljarrah · ‎12-22-2017

I am using a CSV lookup table (MyCSVTable) which contains a list of 10 digit numbers (examples: 2345678900, 2134567891, 3126549877, etc...). The CSV can look like this for example:

MyField1,MyField2
2345678900,1
2134567891,1
3126549877,1

I am using MyCSVTable to match against my event data field which also happens to be named MyField1 (same name as in MyCSVTable), and perform a calculation on an associated event data called MyField3.

Part of the problem I have is the MyField3 does not have a standard naming convention. For example, if I am matching MyField1=2345678900 from the CSV, the event data field MyField1 could have any one of these values:
+12345678900 OR 12345678900 OR 12345678900_A123456 OR 2345678900. All of which would be valid matches for my purposes.

Can I use rex or regex to reformat MyField1 in event data such that I am able to successfully match my number against any of these occurrences: +12345678900 OR 12345678900 OR 12345678900_A123456 OR 2345678900?

I tried this but it doesn't work:

index=<...> source=<...> | rex field=MyField1 "(?i)^(?.+?)(\s+1)?$" | lookup MyCSVTable MyField1 OUTPUT MyField2 | where MyField2=1 | stats sum(MyField3) by MyField1

Thank you in advance for your advice.

niketn · ‎12-26-2017

@waeleljarrah, Please try the following run anywhere example to remove unwanted character/s as per your question. Although I am a bit confused with your details for Field1 and Field3 vs your query using the same. I have used regular expression based matches for replace(), which means similar result can also be obtained with rex command as well:

| makeresults
| eval myField1="+12345678900,12345678900,12345678900_A123456,2345678900"
| makemv myField1 delim=","
| mvexpand myField1
| eval inputForCSV=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2")

So your search query might look like the following:

index=<...> source=<...> 
| stats sum(MyField3) by MyField1
| eval myField1=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2") 
| lookup MyCSVTable MyField1 OUTPUT MyField2 
| where MyField2=1

PS: Ideally you should perform transforming command(in your case stats) first before performing a lookup for better query performance. Refer to documentation: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

DalJeanis · ‎12-26-2017

Since all of your examples have a chunk of exactly ten digits long with no foreign characters, you could just do this...

|  rex field=MyField1 "(?<MyField10>\d{10})"

If you need to remove hyphens from the middle of the ten characters, then you would need something more like @niketnilay's examples.

niketn · ‎12-26-2017

@waeleljarrah, Please try the following run anywhere example to remove unwanted character/s as per your question. Although I am a bit confused with your details for Field1 and Field3 vs your query using the same. I have used regular expression based matches for replace(), which means similar result can also be obtained with rex command as well:

| makeresults
| eval myField1="+12345678900,12345678900,12345678900_A123456,2345678900"
| makemv myField1 delim=","
| mvexpand myField1
| eval inputForCSV=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2")

So your search query might look like the following:

index=<...> source=<...> 
| stats sum(MyField3) by MyField1
| eval myField1=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2") 
| lookup MyCSVTable MyField1 OUTPUT MyField2 
| where MyField2=1

PS: Ideally you should perform transforming command(in your case stats) first before performing a lookup for better query performance. Refer to documentation: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

niketn · ‎12-23-2017

@waeleljarrah please post your data/SPL using the code button (101010) here on Splunk Answers so that special characters do not escape.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

waeleljarrah · ‎12-26-2017

@niketnilay Thanks, I now edited original post/question and used code button (101010) accordingly.

cpetterborg · ‎12-22-2017

If you want to reformat the field, what do you want to reformat it to? It is easy to extract the field given the four formats, but what does it need to look like after reformatting? The example seems to indicate that you only want the digits up to the first non-digit from the field, but I'm not quite sure.

waeleljarrah · ‎12-26-2017

@cpetterborg Thanks, yes the following should all be formatted as equal to 2345678900:

+12345678900 OR 12345678900 OR 12345678900_A123456 OR 2345678900

So, this in essence removes the leading +1, or 1, or trailing _A, and if the number is 10 digits and does not start with +1 or 1 then leave it as is (example: 2345678900).

Please advise, thanks again.

How can I use regex to match only certain parts of a field string value?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!