Splunk Search

How can I use regex to match only certain parts of a field string value?

waeleljarrah
Explorer

I am using a CSV lookup table (MyCSVTable) which contains a list of 10 digit numbers (examples: 2345678900, 2134567891, 3126549877, etc...). The CSV can look like this for example:

MyField1,MyField2
2345678900,1
2134567891,1
3126549877,1

I am using MyCSVTable to match against my event data field which also happens to be named MyField1 (same name as in MyCSVTable), and perform a calculation on an associated event data called MyField3.

Part of the problem I have is the MyField3 does not have a standard naming convention. For example, if I am matching MyField1=2345678900 from the CSV, the event data field MyField1 could have any one of these values:
+12345678900 OR 12345678900 OR 12345678900_A123456 OR 2345678900. All of which would be valid matches for my purposes.

Can I use rex or regex to reformat MyField1 in event data such that I am able to successfully match my number against any of these occurrences: +12345678900 OR 12345678900 OR 12345678900_A123456 OR 2345678900?

I tried this but it doesn't work:

index=<...> source=<...> | rex field=MyField1 "(?i)^(?.+?)(\s+1)?$" | lookup MyCSVTable MyField1 OUTPUT MyField2 | where MyField2=1 | stats sum(MyField3) by MyField1

Thank you in advance for your advice.

0 Karma
1 Solution

niketn
Legend

@waeleljarrah, Please try the following run anywhere example to remove unwanted character/s as per your question. Although I am a bit confused with your details for Field1 and Field3 vs your query using the same. I have used regular expression based matches for replace(), which means similar result can also be obtained with rex command as well:

| makeresults
| eval myField1="+12345678900,12345678900,12345678900_A123456,2345678900"
| makemv myField1 delim=","
| mvexpand myField1
| eval inputForCSV=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2")

So your search query might look like the following:

index=<...> source=<...> 
| stats sum(MyField3) by MyField1
| eval myField1=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2") 
| lookup MyCSVTable MyField1 OUTPUT MyField2 
| where MyField2=1 

PS: Ideally you should perform transforming command(in your case stats) first before performing a lookup for better query performance. Refer to documentation: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Since all of your examples have a chunk of exactly ten digits long with no foreign characters, you could just do this...

|  rex field=MyField1 "(?<MyField10>\d{10})"

If you need to remove hyphens from the middle of the ten characters, then you would need something more like @niketnilay's examples.

0 Karma

niketn
Legend

@waeleljarrah, Please try the following run anywhere example to remove unwanted character/s as per your question. Although I am a bit confused with your details for Field1 and Field3 vs your query using the same. I have used regular expression based matches for replace(), which means similar result can also be obtained with rex command as well:

| makeresults
| eval myField1="+12345678900,12345678900,12345678900_A123456,2345678900"
| makemv myField1 delim=","
| mvexpand myField1
| eval inputForCSV=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2")

So your search query might look like the following:

index=<...> source=<...> 
| stats sum(MyField3) by MyField1
| eval myField1=replace(myField1,"^(\+1|1)(\d+)(_.*)?","\2") 
| lookup MyCSVTable MyField1 OUTPUT MyField2 
| where MyField2=1 

PS: Ideally you should perform transforming command(in your case stats) first before performing a lookup for better query performance. Refer to documentation: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

niketn
Legend

@waeleljarrah please post your data/SPL using the code button (101010) here on Splunk Answers so that special characters do not escape.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

waeleljarrah
Explorer

@niketnilay Thanks, I now edited original post/question and used code button (101010) accordingly.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If you want to reformat the field, what do you want to reformat it to? It is easy to extract the field given the four formats, but what does it need to look like after reformatting? The example seems to indicate that you only want the digits up to the first non-digit from the field, but I'm not quite sure.

0 Karma

waeleljarrah
Explorer

@cpetterborg Thanks, yes the following should all be formatted as equal to 2345678900:

+12345678900 OR 12345678900 OR 12345678900_A123456 OR 2345678900

So, this in essence removes the leading +1, or 1, or trailing _A, and if the number is 10 digits and does not start with +1 or 1 then leave it as is (example: 2345678900).

Please advise, thanks again.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...