Solved: Regex for multiline events

melonman · ‎01-20-2012

How do I configure regex to get only test after each line's : in the following log?

I have a log file containing events like this:

PID: 3047
CurrentTime: 2012/01/20 16:23:55
Username: username45
Floor: floor7
IPADDRESS: 10.1.1.4
Result: success
CurrentTime: 2012/01/20 16:23:54
Username: username51
Floor: floor3
IPADDRESS: 10.1.1.32
Result: fail
PID: 8020
CurrentTime: 2012/01/20 16:23:53
Username: username67
Floor: floor8
IPADDRESS: 10.1.1.24
Result: success
Additional: Some more information

and props.conf includes the following configuraion.

[mytype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^PID:
EXTRACT-result = ^Result: (?P<result>.+)$
EXTRACT-ipaddress = ^IPADDRESS: (?P<ipaddress>.+)$
EXTRACT-floor = ^Floor: (?P<floor>.+)$

In order to get value sucha as floor8 for floor and 10.1.1.24 for ipaddress after :,
I ran the search for each field, but still getting unwanted information too.

sourcetype="mytype" | table result

    result
--- ----------------------------------------
1   success
2   fail PID: 9360
3   fail PID: 6634
4   fail PID: 3908
5   fail PID: 1183
6   success PID: 8456
7   success PID: 5730
8   fail PID: 3004
9   fail PID: 278
10  fail PID: 7551

sourcetype="mytype" | table result

    ipaddress
--- ----------------------------------------
1   10.1.1.21 Result: success
2   10.1.1.34 Result: fail PID: 9360
3   10.1.1.9 Result: fail PID: 6634
4   10.1.1.21 Result: fail PID: 3908
5   10.1.1.33 Result: fail PID: 1183
6   10.1.1.8 Result: success PID: 8456
7   10.1.1.20 Result: success PID: 5730
8   10.1.1.32 Result: fail PID: 3004
9   10.1.1.8 Result: fail PID: 278
10  10.1.1.20 Result: fail PID: 7551

sourcetype="mytype" | table result

    floor
--- ----------------------------------------
1   floor7 IPADDRESS: 10.1.1.21 Result: success
2   floor1 IPADDRESS: 10.1.1.34 Result: fail PID: 9360
3   floor5 IPADDRESS: 10.1.1.9 Result: fail PID: 6634
4   floor9 IPADDRESS: 10.1.1.21 Result: fail PID: 3908
5   floor3 IPADDRESS: 10.1.1.33 Result: fail PID: 1183
6   floor8 IPADDRESS: 10.1.1.8 Result: success PID: 8456
7   floor2 IPADDRESS: 10.1.1.20 Result: success PID: 5730
8   floor6 IPADDRESS: 10.1.1.32 Result: fail PID: 3004
9   floor0 IPADDRESS: 10.1.1.8 Result: fail PID: 278
10  floor4 IPADDRESS: 10.1.1.20 Result: fail PID: 7551

How do I configure regex in props.conf to get only test after each line's : ?

Thank you in advance

Ayn · ‎01-20-2012

You need to activate multi-line mode matching for the regex by specifying (?m) at the start. Like this for instance:

EXTRACT-ipaddress = (?m)^IPADDRESS: (?P<ipaddress>.+)$

More information on multi-line mode matching in regular expressions:

http://www.regular-expressions.info/modifiers.html

http://www.regular-expressions.info/anchors.html

View solution in original post

hexxamillion · ‎10-31-2017

We have Splunk Enterprise 7.0.0.

I have a multiline event I am trying to configure a sourcetype for and was able to successfully test using regex101.com but I do not get the results in Splunk when setting up the sourcetype.

This example log has 400+ lines. I know the word to start and the word to end the match for the event. I just need to match the lines started with PRPM down to the line with the word END. I should also note that I had to add the MAX_EVENTS due to the length of the event data.

Example:
PRPM*28 blah blah blah blah blah
blah blah blah
blah ........blah
blah blah
....
..blah blah
END

This works on REGEX101.com but not in Splunk. (?s)^PRPM(.*?END)

I also tried with (?m). Suggestions?

itinney · ‎01-20-2012

Yes Ayn is correct. The non-greedy match fixes it although you should not need it. This config works for me:

[mytype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^CurrentTime:
EXTRACT-result = ^(?m)Result: (?P<result>.+?)$
EXTRACT-ipaddress = ^(?m)IPADDRESS: (?P<ipaddress>.+?)$
EXTRACT-floor = ^(?m)Floor: (?P<floor>.+?)$

BUT, so does this:

[mytype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^CurrentTime:
EXTRACT-result = ^(?m-s)Result: (?P<result>.+)$
EXTRACT-ipaddress = ^(?m-s)IPADDRESS: (?P<ipaddress>.+)$
EXTRACT-floor = ^(?m-s)Floor: (?P<floor>.+)$

The problem appears to be that the 's' modifier is 'on' by default! It should not be if we're using PCRE.
The 's' modifier says that the '.' character will also match newline characters (i.e. \r or \n).
The first config above works because we are saying do a non-greedy match.
The second config above works because we are saying do not allow the '.' to match a newline char.

For some reason Splunk is behaving as though we had said (?sm). Anyway we have a fix!

Note I changed the BREAK_ONLY_BEFORE because the PID does not appear in every record.

Ayn solved your problem, I'm just clarifying was it didn't work.

nzambo_splunk · ‎03-21-2017

Just another huge +1 for the -s. Very helpful.

markmcd · ‎08-08-2013

Huge +1 for the "s" modifier. It had me stuck 🙂

melonman · ‎01-20-2012

Thanks for helpful comment!

Ayn · ‎01-20-2012

You need to activate multi-line mode matching for the regex by specifying (?m) at the start. Like this for instance:

EXTRACT-ipaddress = (?m)^IPADDRESS: (?P<ipaddress>.+)$

More information on multi-line mode matching in regular expressions:

http://www.regular-expressions.info/modifiers.html

http://www.regular-expressions.info/anchors.html

melonman · ‎01-20-2012

Thank you for your help!
Now I can get what I wanted. I added "?" as you pointed out.

Ayn · ‎01-20-2012

I'll admit I don't know why (?m) doesn't seem to work in your case - it should! Your second example could possibly work anyway if you changed the regex a bit - right now you're performing a greedy match so the regex will match as much as it possibly can. You need to change it to a non-greedy version by adding a ? at the end. Like this:

EXTRACT-ipaddress = (?m)^IPADDRESS: (?P<ipaddress>.+?)[\r\n]

melonman · ‎01-20-2012

Thanks, I tried to change to multi-line mode, but still no luck.

With EXTRACT-ipaddress = (?m)^IPADDRESS: (?P.+)$,
"sourcetype="mytype" | head 1 | table ipaddress" still returns:

1 10.1.1.21 Result: success
2 10.1.1.34 Result: fail PID: 9360
3 10.1.1.9 Result: fail PID: 663410.1.1.21 Result: success

With EXTRACT-ipaddress = (?m)^IPADDRESS: (?P.+)[\r\n],
"sourcetype="mytype" | head 1 | table ipaddress" still returns:

1 10.1.1.21
2 10.1.1.34 Result: fail
3 10.1.1.9 Result: fail

While I am reading the regex website, I would like to know how to get this right.

Regex for multiline events

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!