Splunk Search

Regex: I want to match a string and then extract the next lines until matching another string

edrivera3
Builder

HI

I have the following in all my events:

ERROR=40392
"This error ... blah...blah....
... ... .. ... ... .. ... ..... ..
... .. ... ... .. . ..."
END

I would like to extract everything between "ERR0R=40302" and "END" in a field. Also the error number change for each event. I would appreciate your help.

1 Solution

rsennett_splunk
Splunk Employee
Splunk Employee

edrivera3,

First, let me recommend you check out regex101.com because it will show you exactly what your regex is capturing and what it's not. It also explains ever step of your regex. Very helpful for learning.

Since you mention that the error will have "different numbers" I think it's worth pointing out that regex is a pattern matching. So sometimes you will notate literal things like ERROR= and sometimes you will use representations like \d for digit and \d+ for one or more digits. It helps to be precise when you can. So even if the numbers were different, if you always have a five digit error code the regex for just that... would look like this ERROR=\d{5}which translates to literally ERROR= followed by five digits... always. So in this case you represent what you don't want to capture, but you want to make sure is included which is: ERROR=\d+\s+\"

Then this could get tricky: Your sample seems to have carriage returns. so while it might seem like a good idea to use a dot (which represents any character) and say .+ that would only work for one line in the message, since the dot actually represents any character except... newline, and it looks like you have newline... so here's the trick. there are flags that you can apply to the regex (See regex101 explanation) for example prefix your regex with (?i) and that tells Splunk that you want the regex to be case insensitive

In this case you'll use the /s flag (another way to represent it... ) so to have the .+ include newline (and represent all characters including newline you code it like this:

(?s)ERROR=\d+\s+\"(?P<myfield>.+)\"\s+END

which says:
Look at this as if everything is a single line
Walk past the following literal characters: ERROR=
Then walk past one or more digits, followed by a space and a literal double quote
Then create a field capturing group called "field"
Inside the field you put one or more characters
But don't include the next double quote, the one or more spaces that follow or, the literal word END
That last bit sort of anchors the field as before the combination of double quote, spaces and END. Sometimes you have to be more specific than that... (if there are other things in the event that look very close to the rest) but it's fine here if that's really what it looks like.

You can use that regex to extract a search time field (in the GUI, Settings> fields>extracted fields (and that will be placed into props.conf)
Or you can use it for a rex in your search:

...|rex "(?s)ERROR=\d+\s+\"(?P<myfield>.+)\"\s+END"|HEAD 1|table myfield

In your research you may have come across something like .* as well as .+
the .* means zero or more characters and if it finds some it's very greedy, meaning it'll just keep going sometimes.
the other means one or more, and it is perhaps less greedy... although still... greedy. 🙂
In this case, either is good... but you only use the * when you really need it. (or when you think you might have zero characters)

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

View solution in original post

rsennett_splunk
Splunk Employee
Splunk Employee

edrivera3,

First, let me recommend you check out regex101.com because it will show you exactly what your regex is capturing and what it's not. It also explains ever step of your regex. Very helpful for learning.

Since you mention that the error will have "different numbers" I think it's worth pointing out that regex is a pattern matching. So sometimes you will notate literal things like ERROR= and sometimes you will use representations like \d for digit and \d+ for one or more digits. It helps to be precise when you can. So even if the numbers were different, if you always have a five digit error code the regex for just that... would look like this ERROR=\d{5}which translates to literally ERROR= followed by five digits... always. So in this case you represent what you don't want to capture, but you want to make sure is included which is: ERROR=\d+\s+\"

Then this could get tricky: Your sample seems to have carriage returns. so while it might seem like a good idea to use a dot (which represents any character) and say .+ that would only work for one line in the message, since the dot actually represents any character except... newline, and it looks like you have newline... so here's the trick. there are flags that you can apply to the regex (See regex101 explanation) for example prefix your regex with (?i) and that tells Splunk that you want the regex to be case insensitive

In this case you'll use the /s flag (another way to represent it... ) so to have the .+ include newline (and represent all characters including newline you code it like this:

(?s)ERROR=\d+\s+\"(?P<myfield>.+)\"\s+END

which says:
Look at this as if everything is a single line
Walk past the following literal characters: ERROR=
Then walk past one or more digits, followed by a space and a literal double quote
Then create a field capturing group called "field"
Inside the field you put one or more characters
But don't include the next double quote, the one or more spaces that follow or, the literal word END
That last bit sort of anchors the field as before the combination of double quote, spaces and END. Sometimes you have to be more specific than that... (if there are other things in the event that look very close to the rest) but it's fine here if that's really what it looks like.

You can use that regex to extract a search time field (in the GUI, Settings> fields>extracted fields (and that will be placed into props.conf)
Or you can use it for a rex in your search:

...|rex "(?s)ERROR=\d+\s+\"(?P<myfield>.+)\"\s+END"|HEAD 1|table myfield

In your research you may have come across something like .* as well as .+
the .* means zero or more characters and if it finds some it's very greedy, meaning it'll just keep going sometimes.
the other means one or more, and it is perhaps less greedy... although still... greedy. 🙂
In this case, either is good... but you only use the * when you really need it. (or when you think you might have zero characters)

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

dflodstrom
Builder

For this sample log entry:

ERROR=40392 "This error blah blah" END

It would be possible to use rex inline like (rex defaults to the field _raw unless you specify otherwise):

<your search> | rex "ERROR=\d+\s"(?<new_field>.+)"\sEND"

You will end up with: new_field=This error blah blah

You can put that into props.conf for a search time extraciton:

EXTRACT-your_extract = ERROR=\d+\s"(?<new_field>.+)"\sEND

mohan401
Engager

For this sample log entry
dkf:fhj fjff jffj from IP 11.11.111.11. jdjd"\n

0 Karma

stephane_cyrill
Builder

HI edrivera3,
the rex or regex is the best for that.try this to extract for example properties values and put them in one field:

......| rex max_match=0 field=_raw " HERE YOU PUT YOUR REGEX"

If you cannot easily write regex like me, use IFX,do as if you want to extract the values, the IFX will provide the regular expression that can use there.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...