Splunk Search

How do I edit my regex to parse fields correctly if a field delimiter appears within a field?

markwymer
Path Finder

Hi,

Another regex problem I'm afraid.....

I've got a very long event with 37 fields where all the fields are quoted and separated by comma. Also there are no key=value pairs.
For the most part my regex works nicely with the event data, but there are occasions where a quote also appears in the actual field data thereby breaking my regex separator character.

Working example (extremely simplified regex and event):

^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"]+)","(?P<response>[^\n]+)"$

Data:

"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/lpbm/select?fl=city","Logging rate limit reached"

No problem with this, all the fields parse out OK. However, this next event fails - note the additional " in fourth field:-

"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/"lpbm"/select?fl=city","Logging rate limit reached"

This now breaks the [^"]+)"," part of my regex and distorts the field extractions.

Is there a way to do the equivalent of:-

......","(?P<request>[^","]+)",".......

I know that this is invalid, but I don't know what the alternative looks like 😞 !!

Thanks for any help,
Mark.

0 Karma
1 Solution

Sebastian2
Path Finder

Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the [^"] syntax. The advantage is, that you can use the whole pattern "," as seperator instead of just [^"]. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:

^"(?P<dest_ip>.+?)","(?P<dest_port>.+?)","(?P<uri>.+?)","(?P<request>.+?)","(?P<response>[^\n]+)"$

What's the difference:

  • I'd say the [^"] syntax is "old school". The parser is consuming just everything until an " is found.
  • Lazy quantifiers, how ever, parse as much as they can. And "as much" means: As much as possible unless the whole pattern doesn't match. In theory this should (I can't test that right now) therefore consume a single " but no "," as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)

/edit & just as info: a ? makes an quantifier lazy (here: .+?: "Consume lazy at least one character").

View solution in original post

javiergn
SplunkTrust
SplunkTrust

Try the following:

^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"][^,]+)","(?P<response>[^\n]+)"$

You can test it here: https://regex101.com/r/nD3sL1/2

Sebastian2
Path Finder

Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the [^"] syntax. The advantage is, that you can use the whole pattern "," as seperator instead of just [^"]. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:

^"(?P<dest_ip>.+?)","(?P<dest_port>.+?)","(?P<uri>.+?)","(?P<request>.+?)","(?P<response>[^\n]+)"$

What's the difference:

  • I'd say the [^"] syntax is "old school". The parser is consuming just everything until an " is found.
  • Lazy quantifiers, how ever, parse as much as they can. And "as much" means: As much as possible unless the whole pattern doesn't match. In theory this should (I can't test that right now) therefore consume a single " but no "," as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)

/edit & just as info: a ? makes an quantifier lazy (here: .+?: "Consume lazy at least one character").

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...