Splunk Search

Question on regex field extraction in props.conf - in which search time variable they are stored?

edoardo_vicendo
Contributor

Hi All,

I have some question on the regular expression extraction they can be added in props.conf
Supposing I have indexed in Splunk files with multiple lines that at a certain (not fixed) point have the following pattern, and I have to extract the "nameoftheuser" and the "nameofthejob"

= USER      : nameoftheuser AAA = JOB      : nameofthejob

I know I can do that in this way:

EXTRACT-USER = ^(?m)= USER      : (?P<USER>\w+)
EXTRACT-JOB  = ^(?m)= USER      : \w+ AAA = JOB      : (?P<JOB>\w+)

or even in that way:

EXTRACT-USER,JOB = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

My first question is, referring to props.conf documentation:

considering that EXTRACT-USER is the <class> and (?P<USER>\w+) is the <regex> the field will be stored in the class or in the regex? Just to be more clear:

EXTRACT-TEST1,TEST2 = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

at search time USER and JOB values will be stored in TEST1 and TEST2 variables or in USER and JOB variables?

Second question, I do not understand what exactly is indicated here in props.conf documentation:

Use '<regex> in <src_field>' to match the regex against the values of a
specific field. Otherwise it just matches against _raw (all raw event
data).

I understand it is an advice to improve the performance of the field extraction, but I do not get exactly how to take advantage of it...Does someone can explain it to me?

Thanks a lot,
Edoardo

0 Karma
1 Solution

FrankVl
Ultra Champion

The data goes into the field as you label it in the capture group. So EXTRACT-TEST1,TEST2 = ^(?m)= USER : (?P<USER>\w+) AAA = JOB : (?P<JOB>\w+) puts the data in the USER and JOB fields.

The <regex> in <src_field> can be used if you already have fields extracted (e.g. at index time, or with preceding EXTRACT items). You can then apply a further extract on those previously extracted fields. For example when a log contains some header fields and then a message, but that message also contains some details. You can then define an extraction that first gets the header and message fields and then a second extraction that takes the message field as input and extracts further details from it.

View solution in original post

0 Karma

FrankVl
Ultra Champion

The data goes into the field as you label it in the capture group. So EXTRACT-TEST1,TEST2 = ^(?m)= USER : (?P<USER>\w+) AAA = JOB : (?P<JOB>\w+) puts the data in the USER and JOB fields.

The <regex> in <src_field> can be used if you already have fields extracted (e.g. at index time, or with preceding EXTRACT items). You can then apply a further extract on those previously extracted fields. For example when a log contains some header fields and then a message, but that message also contains some details. You can then define an extraction that first gets the header and message fields and then a second extraction that takes the message field as input and extracts further details from it.

0 Karma

edoardo_vicendo
Contributor

Hi FrankVI,

Thanks a lot for your answer, may you just confirm me if the following regex is more efficient in term of performance:

EXTRACT-TEST1,TEST2 = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

compared to the below one that is split in two different regex:

 EXTRACT-USER = ^(?m)= USER      : (?P<USER>\w+)
 EXTRACT-JOB  = ^(?m)= USER      : \w+ AAA = JOB      : (?P<JOB>\w+)

Thanks a lot,
Edoardo

0 Karma

FrankVl
Ultra Champion

I would guess so, but I don't know enough of the nitty gritty technical details of how all that regex stuff works under the hood to give you an authoritative answer on that.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...