Hello,
I am trying to put together a regex to extract a string. The issue I have is that the string sometimes contains dashes as a seperator
as in 11-23345-6778-CMP
and sometimes there is simply a space 11 23345 8897 CMP
.
I have a regular expression that extracts the string with the dashes, but I am struggling to work out how to also ask the same expression to extract strings that have a space instead.
Is it even possible to combine the two ?
The expression I have is:
rex "(?i)\\|.*?\\|(?P<POLICYNUMBERS>\\d+\\-[a-f0-9]+\\-\\w+)"
Any help or advice is as always greatly appreciated.
Cheers,
Alastair
First of all, please post your regexes as code, otherwise the markup will mess them up.
There are usually a few ways to get there with regex, also this time. You could set up alternatives to your dashes with |
, but you can also just use a less precise item such as .
to capture either dash or whitespace in that position.
Loosely based on your original regex, it could look something like:
(?<POLICYNUMBERS>\d{2}.\d{5}.\d{4}.\w{3})
And lastly, you should use a tool like https://regex101.com/ to help you with any regex matters 🙂
First of all, please post your regexes as code, otherwise the markup will mess them up.
There are usually a few ways to get there with regex, also this time. You could set up alternatives to your dashes with |
, but you can also just use a less precise item such as .
to capture either dash or whitespace in that position.
Loosely based on your original regex, it could look something like:
(?<POLICYNUMBERS>\d{2}.\d{5}.\d{4}.\w{3})
And lastly, you should use a tool like https://regex101.com/ to help you with any regex matters 🙂
This works.. however the format of the extracted string is not always the same. For example:
1-85-F792378
87-F833763-CMP
1 45 122434
I have attempted to use wildcards in the regex but to no avail and despite the explanation provided in regex 101 looking correct I am unable to extract the required information.
All rather frustrating and my severely limited knowledge of regex is not helping 😉
Cheers,
Alastair
We can get there using other means as well... for example, does the string have only the three variants you just posted, i.e. can we work with the number of characters possible in each position? Then something like this could work:
(?<POLICYNUMBERS>(?:\d(?:\s|\-)\d{2}(?:\s|\-)\w+|\d{2}\-\w{7}\-\w{3}))
Alternatively, the idea could be adjusted to respect some variation. This one for example reads elements of one or two digits, then one to seven and one to seven characters and accepts a whitespace or a dash between them:
(?<POLICYNUMBERS>\d{1,3}(?:\s|\-)\w{1,7}(?:\s|\-)\w{1,7})
Be careful with this as it may also match other data as well.
Or is your string uniquely identifyable based on what comes before and/or after it, i.e. does you data look like
[beginning of line]foo 1-85-F792378 some_identifier=x
[beginning of line]foo 87-F833763-CMP some_identifier=y
[beginning of line]foo 1 45 122434 some_identifier=z
Because then we could capture everything based on the place it is with something like
^foo\s(?<POLICYNUMBERS>[^(\ssome\_identifier)]+)
Hello,
The first example worked a treat as the possible number / letter combination is limited to the three string variants.
Thank you so much for your help it really is appreciated.
Cheers,
Alastair