Splunk Search

How to edit my regex to extract a variable string that may have either dashes or spaces?

ahogbin
Communicator

Hello,

I am trying to put together a regex to extract a string. The issue I have is that the string sometimes contains dashes as a seperator
as in 11-23345-6778-CMP and sometimes there is simply a space 11 23345 8897 CMP.

I have a regular expression that extracts the string with the dashes, but I am struggling to work out how to also ask the same expression to extract strings that have a space instead.

Is it even possible to combine the two ?

The expression I have is:

rex "(?i)\\|.*?\\|(?P<POLICYNUMBERS>\\d+\\-[a-f0-9]+\\-\\w+)"

Any help or advice is as always greatly appreciated.

Cheers,

Alastair

Tags (4)
0 Karma
1 Solution

jeffland
SplunkTrust
SplunkTrust

First of all, please post your regexes as code, otherwise the markup will mess them up.

There are usually a few ways to get there with regex, also this time. You could set up alternatives to your dashes with |, but you can also just use a less precise item such as . to capture either dash or whitespace in that position.

Loosely based on your original regex, it could look something like:

(?<POLICYNUMBERS>\d{2}.\d{5}.\d{4}.\w{3})

And lastly, you should use a tool like https://regex101.com/ to help you with any regex matters 🙂

View solution in original post

jeffland
SplunkTrust
SplunkTrust

First of all, please post your regexes as code, otherwise the markup will mess them up.

There are usually a few ways to get there with regex, also this time. You could set up alternatives to your dashes with |, but you can also just use a less precise item such as . to capture either dash or whitespace in that position.

Loosely based on your original regex, it could look something like:

(?<POLICYNUMBERS>\d{2}.\d{5}.\d{4}.\w{3})

And lastly, you should use a tool like https://regex101.com/ to help you with any regex matters 🙂

ahogbin
Communicator

This works.. however the format of the extracted string is not always the same. For example:
1-85-F792378
87-F833763-CMP
1 45 122434

I have attempted to use wildcards in the regex but to no avail and despite the explanation provided in regex 101 looking correct I am unable to extract the required information.

All rather frustrating and my severely limited knowledge of regex is not helping 😉

Cheers,

Alastair

0 Karma

jeffland
SplunkTrust
SplunkTrust

We can get there using other means as well... for example, does the string have only the three variants you just posted, i.e. can we work with the number of characters possible in each position? Then something like this could work:

(?<POLICYNUMBERS>(?:\d(?:\s|\-)\d{2}(?:\s|\-)\w+|\d{2}\-\w{7}\-\w{3}))

Alternatively, the idea could be adjusted to respect some variation. This one for example reads elements of one or two digits, then one to seven and one to seven characters and accepts a whitespace or a dash between them:

(?<POLICYNUMBERS>\d{1,3}(?:\s|\-)\w{1,7}(?:\s|\-)\w{1,7})

Be careful with this as it may also match other data as well.
Or is your string uniquely identifyable based on what comes before and/or after it, i.e. does you data look like

[beginning of line]foo 1-85-F792378 some_identifier=x
[beginning of line]foo 87-F833763-CMP some_identifier=y
[beginning of line]foo 1 45 122434 some_identifier=z

Because then we could capture everything based on the place it is with something like

^foo\s(?<POLICYNUMBERS>[^(\ssome\_identifier)]+)
0 Karma

ahogbin
Communicator

Hello,
The first example worked a treat as the possible number / letter combination is limited to the three string variants.
Thank you so much for your help it really is appreciated.
Cheers,

Alastair

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...