Splunk Search

How do I optimize my regex for a field extraction to improve efficiency and searchability?

HattrickNZ
Motivator

I am working on field extraction in splunk and I have come up with the below regex

(spunk regex does not work the same here)

^[^'\n]*'(?P<field1>\d+)

which pulls this value out:
79037030601

of the following events:

beginTime="2015-07-29T09:00:00+12:00",elementType="MSCServer",userLabel="MSCKPR",measInfoId=83888334,duration="PT3600S",endTime="2015-07-29T10:00:00+12:00",measObjLdn="MSCKPR/ALL HLR:MSCKPR/HLR Number:HLR Number = K'79037030601",c84162779=1,c84162780=1

Now what I am looking at doing is optimizing this regex for time efficiency and searchability in the events.
I am trying to use here to help me optimize it. One example i am working on here is this

How can i work on this regex and then be able to apply it to splunk? I don't think they are the same or are they?

0 Karma
1 Solution

MuS
SplunkTrust
SplunkTrust

Hi HattrickNZ,

using https://regex101.com/ and your provided example I came up with this easy regex:

'(?P<field1>\d+)"

Does this work for all events?

UPDATE: to use it in Splunk use this .. | rex "'(?P\d+)\"" | ...

cheers, MuS

View solution in original post

jeffland
SplunkTrust
SplunkTrust

To see the efficiency of your regexes more detailed than the indication of steps displayed above the regex, you can also use the debug mode of regex101.com to the left to see where you might run into unnecessary steps (and to learn how regexes work in general).

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi HattrickNZ,

using https://regex101.com/ and your provided example I came up with this easy regex:

'(?P<field1>\d+)"

Does this work for all events?

UPDATE: to use it in Splunk use this .. | rex "'(?P\d+)\"" | ...

cheers, MuS

HattrickNZ
Motivator

your one works on regex101 on one event but if I add more events it does not seem to work?
but my one seems to work in splunk for all events.

0 Karma

MuS
SplunkTrust
SplunkTrust

can you provide the others as well?

0 Karma

MuS
SplunkTrust
SplunkTrust

and did you use the /g flag to match global in regex101

HattrickNZ
Motivator

tks
the global /g flag got it and it works on all events

but what is the difference in this

^[^'\n]*'(?P<field1>\d+)

and this:

'(?P<feild1>\d+)"

looks like the only difference is ^[^'\n]* these characters are missing from the start.

Also this does not work in splunk(get Unbalanced quotes. Error):

... | rex '(?P\d+)" | stats count(feild1) by feild

But this does:

... | rex "^[^'\n]*'(?P\d+)" | stats count(feild2) by feild2

Sorry for all the Qs just trying to understand this better.

0 Karma

HattrickNZ
Motivator

For my reference:

'(?P<field1>\d+)"

' - finds the first '
\d+ - \d finds the first digit after '(single quote) + finds all digits that follow and stops before the "(double quote)
() - this has something to do with what to capture
?P - not sure but think it picks the first character for selection OR matches the character P literally (case sensitive) OR might have something to do with storing it in the field name name1

for example
'\d+ - will highlight '79037030601
'(?P)\d+ - will hightlight '79037030601 but it looks like it the cursor is just before the first 7 -- not sure if the ?P is required
'(\d+) - will hightlight '79037030601 and highlights the numbers 79037030601 in blue and ' in green -- soo not sure if the ?P is required

0 Karma

MuS
SplunkTrust
SplunkTrust

the (? ) is for a named matching group and you can use the P with in or not, both will work. As well in regex101.com you will get the explanation of your regex on the top right side

MuS
SplunkTrust
SplunkTrust

it should be like this in Splunk:

... | rex "'(?P<field>\d+)\"" | stats count(field1) by field

and to explain it; it will match a ' single quote and creates a matching group of all digits until the next " double quote. Where as your original regex was like:

^ assert position at start of the string
[^'\n]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
' the literal character '
\n matches a line-feed (newline) character (ASCII 10)
' matches the character ' literally
(?P<field1>\d+) Named capturing group field1
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...