Splunk Search

Strange behavior in regex extraction

edrivera3
Builder

Hi
I want to extract the multi-value field "step" and this is how my event looks like:

STEP: 1005
RESULT: PASS
ACTUAL:
RETRIES: 1

STEP: 1006
RESULT: PASS
ACTUAL:

STEP: 1009
RESULT: PASS
EXPECTED: 90.5
ACTUAL: 91.0
STEP: 1011
RESULT: PASS
ACTUAL:

STEP: 1015
RESULT: PASS
ACTUAL:

I have the following regex:
... | rex "(?<step>STEP:\s{6}\d+[\w\W\n]+?)STEP:\s{6}" max_match=0

But for some strange reason this regex skips every other step so I only extracted steps:1005, 1009, and 1015. I believe the problem is associated with the way the regex reads. After a step is extracted, the regex already passed the "STEP:\s{6}" of the next step so the regex cannot find a pattern there and it continues forward until reach the next step.

This is what I extracted in the field "step":
STEP: 1005
RESULT: PASS
ACTUAL:

RETRIES: 1


STEP: 1009
RESULT: PASS
EXPECTED: 90.5
ACTUAL: 91.0


STEP: 1015
RESULT: PASS
ACTUAL:

As you can see I am catching the correct pattern with this regex. Please let me know what I could do to extract all the values for this field.

Tags (3)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Rex was skipping STEPs because your regex string called for two instances of "STEP" to constitute a match. Using lookahead helps. Regex101.com works with this regex string: (?STEP:[\w\W\n]+?)(?=STEP|$).

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Rex was skipping STEPs because your regex string called for two instances of "STEP" to constitute a match. Using lookahead helps. Regex101.com works with this regex string: (?STEP:[\w\W\n]+?)(?=STEP|$).

---
If this reply helps you, Karma would be appreciated.
0 Karma

edrivera3
Builder

Thank you. That's what I need it a lookahead! This is my regex now:
(?<step>[\w\W\n]+?)(?=STEP)

0 Karma

richgalloway
SplunkTrust
SplunkTrust

That regex will probably miss the last STEP. That's why my regex string included |$.

---
If this reply helps you, Karma would be appreciated.
0 Karma

edrivera3
Builder

You are right I am missing the last STEP, but when I include "|$" I only extract the first line:
This is what I extracted in the field "step":
STEP: 1005
STEP: 1006
STEP: 1009
STEP: 1011
STEP: 1015

So I rather miss the last step than missing info from all other steps. Do you have any idea how to avoid this?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Examine your data closely to see if there is anything else you can use as a terminator.

---
If this reply helps you, Karma would be appreciated.
0 Karma

edrivera3
Builder

I just realize that I can reduce my regex to simply:
... | rex "(?<step>STEP:[\w\W\n]+?)STEP:" max_match=0

This regex gives me the same results, so it doesn't change anything. 😞

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...