Splunk Search

How to modify my regular expression to extract strings between two pipes?

maximusdm
Communicator

hello, I need to extract the strings between both pipes " | | ", for instance, here are a few sample strings:
(sometimes we have a pipe: " I " and sometimes we have a uppercase letter " i" )

ASDSAD ASDASD ASDAS | STRING001 | ASDA ASDASD ASDASDADADA
ASDSAD ASDASD ASDAS I STRING002 I ASDA ASDASD ASDASDADADA

My regular expression works 90% of time:

| rex field="Site Section" ".*\|\s*(?<SiteSection>.*)\s*\|"   
| rex field="Site Section" ".*\I\s*(?<SiteSection>.*)\s*\I"  
| rex field="Site Section" ".*\I\s*(?<SiteSection>.*)\s*\|" 
| rex field="Site Section" ".*\|\s*(?<SiteSection>.*)\s*\I" 

However it does not work for the strings below:
ASDASD ASDASDASDA ADASDADAD I AMC I IFC <=== returns empty
(most likely because of "IFC" string contains a uppercase letter "i")

ASDASD ASDASDASDA ADASDADAD I DISCO I ADASDA <== returns "ISCO"
(most likely because of "IFC" string contains a uppercase letter "i")

Any ideas how to modify my regular expression?
Thanks

Tags (1)
0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try

Updated

your base search | rex field="Site Section" "\s(\||I)\s+(?<SiteSection>.+)\s+(\||I)\s" 

View solution in original post

0 Karma

gokadroid
Motivator

If still required, can you check this one which shall work in most of the cases:

your query to return events
| rex field=_raw"\s*(\s*\|\s*(?<captureMe>[^\|]+)\|\s*)"
| table captureMe

See extraction here

0 Karma

somesoni2
Revered Legend

Give this a try

Updated

your base search | rex field="Site Section" "\s(\||I)\s+(?<SiteSection>.+)\s+(\||I)\s" 
0 Karma

maximusdm
Communicator

it is a lot better but still if I have a letter uppercase " i " after the second pipe " | " then it doesnt work properly. Thanks

0 Karma

somesoni2
Revered Legend

A sample log where it's failing?

0 Karma

maximusdm
Communicator

if you have a string such as: ABCDE I AAA I IFC the results will be "AAA I" and not "AAA" as it should be.

0 Karma

somesoni2
Revered Legend

The value/string that you want to capture, will it always be a single word or can be multiple words?
Try the updated answer as well.

0 Karma

maximusdm
Communicator

with your update I only had one string which failed and it is because there is no space between the pipe "|" and the letter "i", for instance:
AASSDDF DFGJKJ | A&E |FYI will return nothing.

PS: strings with 2 words between the pipes work just fine!

0 Karma

somesoni2
Revered Legend

How about this?

your base search | rex field="Site Section" "\s(\||I)\s+(?<SiteSection>.+)\s+(\||I\s)" 
0 Karma

maximusdm
Communicator

now it fails when there are no spaces between the first pipe LOL
for instance:
ASDF ASDF| A&E |FYI or
ASDF ASDF |A&E |FYI

0 Karma

maximusdm
Communicator

This resolved my problem by replacing the " i " with pipes before the next reg.exp.

| rex field="Site Section" mode=sed "s,\sI\s, | ,g"
| rex field="Site Section" ".|\s(?.)\s|"

I want to thank you for pointing me to the right direction.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...