I have a field called Title, where it may sometimes end with the text
Ends 9 P.M.
or varying case related variants.
I can easily do this in my search
| rex mode=sed field=Title "s/(?i) Ends 9.?p.?m.?//"
which performs the job nicely, but I want to be able to do this as standard, so I tried setting up a transform and field extraction with the following regex
(.*)((?i) ends [0-9]*.?[ap].?m)?
but the optional ? at the end of the 'ends...' group means that the first (.*) will capture all text, including the 'ends...' section, so the result is no change.
If I get rid of the last ? then it works for fields that have the 'ends...' but not for those fields that don't so they lose their value.
Any help on the right regex or a way to setup a 'sed' style regex in conf?
Try this -
(?i)(^.*(?=\s*ends\s+\d+\s?[ap]\.?m\.?.*)|^.*)
This is a case-insensitive flag (?i)
followed by a single capture group which has two options. The first option is anything, followed by a positive lookahead (?=
for a value like " ends 9 pm". You'll notice I've allowed for 2-digit hours, etc. If that one fails, the second option takes everything. Both options require the match to start at the beginning of the string, with the first one ending at the start of the positive lookahead, and the second option taking the entire string.
Try this -
(?i)(^.*(?=\s*ends\s+\d+\s?[ap]\.?m\.?.*)|^.*)
This is a case-insensitive flag (?i)
followed by a single capture group which has two options. The first option is anything, followed by a positive lookahead (?=
for a value like " ends 9 pm". You'll notice I've allowed for 2-digit hours, etc. If that one fails, the second option takes everything. Both options require the match to start at the beginning of the string, with the first one ending at the start of the positive lookahead, and the second option taking the entire string.
Ah, that's the trick with the positive lookahead... That single capture group is the key, which means I can use Title::$1 in the transforms.conf and it works.
Out of interest, would lookaround work to remove prefixes to strings? I played around with a few attempts, but I don't see that it would.
Thanks
Updated. Correct tool is definitely not lookaround.
Just need to take group 2 from this one:
^(drop this prefix )?(.*)
Hi bowesmana,
try out this regex and see if it will do the trick.
https://regex101.com/r/3MSGhl/2
(.+)(?:((?i)\sends\s[1]?[0-9]\s[ap]\.m\.))$
That has the same problem as my original, i.e. it does not capture anything in the capture group unless the text does have the trailing "ends..." phrase, e.g.
The phrase
Weekend Unreserved
gets an empty capture group 1 as the regex requires the "ends..." to be present to result in a match
I could make two alternatives within the regex, but then I am not sure how to assign the correct numbered capture group to the Title.
How about this one?
Still has the same issue, how does that work with transforms.conf, where the assignment is done with
Title::$X
where X is the capture group #. Unless it's always the same number how do you assign more than one capture group to the same field?