Splunk Search

RegEx for pattern matching and extraction

mbasharat
Builder

Hi,

I have data that contains Sessions ID labeled as (SES) and User ID labeled as (ABC).

When I look at the events, I am seeing below variations. RegEx should grab anything that is 14 digits followed by 0 or more groups of dash/hyphen with 9 digits or dash/hyphen with 0 digits. I need a RegEx that extract the SES and ABC into separate fields from below variations.

Formats seen:
SES
SES-ABC
SES—ABC
SES—ABC-
SES-ABC-ABC

Sample data:
1234567-123456789---
1234567-1234567890-123456789--
12345678-123456789--A12345678-123456789
123456789
12345678900000
12345ac4-1234-1a12-9as9-1aa111as23aa
12345678900000-123456789
12345678900000-123456789-1234567890

Thanks in-advance

Tags (2)
0 Karma
1 Solution

to4kawa
Ultra Champion
| makeresults 
| eval _raw="raw
1234567-123456789---
1234567-1234567890-123456789--
12345678-123456789--A12345678-123456789
123456789
12345678900000
12345ac4-1234-1a12-9as9-1aa111as23aa
12345678900000-123456789
12345678900000-123456789-1234567890" 
| multikv forceheader=1 
| rex max_match=2 "(?<SES>^\d+)|-(?<ABC>\d+)(?:-|$)" 
| eval SES=trim(SES,"0"), ABC=trim(ABC,"0")

use rex with limits max_match

View solution in original post

0 Karma

to4kawa
Ultra Champion
| makeresults 
| eval _raw="raw
1234567-123456789---
1234567-1234567890-123456789--
12345678-123456789--A12345678-123456789
123456789
12345678900000
12345ac4-1234-1a12-9as9-1aa111as23aa
12345678900000-123456789
12345678900000-123456789-1234567890" 
| multikv forceheader=1 
| rex max_match=2 "(?<SES>^\d+)|-(?<ABC>\d+)(?:-|$)" 
| eval SES=trim(SES,"0"), ABC=trim(ABC,"0")

use rex with limits max_match

0 Karma

mbasharat
Builder

After dealing with customer, data at the source is fixed. Above RegEx works perfectly now. THANK YOU!

0 Karma

to4kawa
Ultra Champion

12345ac4-1234-1a12-9as9-1aa111as23aa
where is SES and ABC?

0 Karma

mbasharat
Builder

Hi @ t04kawa This one is a very odd pattern and I am also scratching my head when I was looking at it. Lemme try your provided solution below. Will report back shortly.

0 Karma

jpolvino
Builder

Hi, can you please provide a little more detail? Specifically in the examples you provide, what are the examples of SES and ABC matches you expect from the legal ones? And which of those should not match anything?

When you have ABC twice (the last formats seen line) is that literally the same ABC twice, or different ABCs?

mbasharat
Builder

Hi @ jpolvino,

I only need SES and ABC extracted from above patterns. In last example, ABC is twice. It is same ABC but second one has additional number or a character. I will need the 9 digit ABCs only which is the middle one in last example.

Sample data:
1234567-123456789--- (Need 9 digit ABC only, 123456789)
1234567-1234567890-123456789-- (Need 9 digit ABC only, 123456789)
12345678-123456789--A12345678-123456789 (Need 9 digit ABC only, 123456789, last one)
123456789 (Need 9 digit ABC only, 123456789)
12345678900000 (Need 9 digit ABC only, 123456789)
12345ac4-1234-1a12-9as9-1aa111as23aa (This I am trying to figure out with data owners to clarify this pattern)
12345678900000-123456789 (Need 9 digit ABC only, 123456789)
12345678900000-123456789-1234567890 (Need 9 digit ABC only, 123456789, the middle one)

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...