Splunk Search

Question on regex field extraction in props.conf - in which search time variable they are stored?

edoardo_vicendo
Contributor

Hi All,

I have some question on the regular expression extraction they can be added in props.conf
Supposing I have indexed in Splunk files with multiple lines that at a certain (not fixed) point have the following pattern, and I have to extract the "nameoftheuser" and the "nameofthejob"

= USER      : nameoftheuser AAA = JOB      : nameofthejob

I know I can do that in this way:

EXTRACT-USER = ^(?m)= USER      : (?P<USER>\w+)
EXTRACT-JOB  = ^(?m)= USER      : \w+ AAA = JOB      : (?P<JOB>\w+)

or even in that way:

EXTRACT-USER,JOB = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

My first question is, referring to props.conf documentation:

considering that EXTRACT-USER is the <class> and (?P<USER>\w+) is the <regex> the field will be stored in the class or in the regex? Just to be more clear:

EXTRACT-TEST1,TEST2 = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

at search time USER and JOB values will be stored in TEST1 and TEST2 variables or in USER and JOB variables?

Second question, I do not understand what exactly is indicated here in props.conf documentation:

Use '<regex> in <src_field>' to match the regex against the values of a
specific field. Otherwise it just matches against _raw (all raw event
data).

I understand it is an advice to improve the performance of the field extraction, but I do not get exactly how to take advantage of it...Does someone can explain it to me?

Thanks a lot,
Edoardo

0 Karma
1 Solution

FrankVl
Ultra Champion

The data goes into the field as you label it in the capture group. So EXTRACT-TEST1,TEST2 = ^(?m)= USER : (?P<USER>\w+) AAA = JOB : (?P<JOB>\w+) puts the data in the USER and JOB fields.

The <regex> in <src_field> can be used if you already have fields extracted (e.g. at index time, or with preceding EXTRACT items). You can then apply a further extract on those previously extracted fields. For example when a log contains some header fields and then a message, but that message also contains some details. You can then define an extraction that first gets the header and message fields and then a second extraction that takes the message field as input and extracts further details from it.

View solution in original post

0 Karma

FrankVl
Ultra Champion

The data goes into the field as you label it in the capture group. So EXTRACT-TEST1,TEST2 = ^(?m)= USER : (?P<USER>\w+) AAA = JOB : (?P<JOB>\w+) puts the data in the USER and JOB fields.

The <regex> in <src_field> can be used if you already have fields extracted (e.g. at index time, or with preceding EXTRACT items). You can then apply a further extract on those previously extracted fields. For example when a log contains some header fields and then a message, but that message also contains some details. You can then define an extraction that first gets the header and message fields and then a second extraction that takes the message field as input and extracts further details from it.

0 Karma

edoardo_vicendo
Contributor

Hi FrankVI,

Thanks a lot for your answer, may you just confirm me if the following regex is more efficient in term of performance:

EXTRACT-TEST1,TEST2 = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

compared to the below one that is split in two different regex:

 EXTRACT-USER = ^(?m)= USER      : (?P<USER>\w+)
 EXTRACT-JOB  = ^(?m)= USER      : \w+ AAA = JOB      : (?P<JOB>\w+)

Thanks a lot,
Edoardo

0 Karma

FrankVl
Ultra Champion

I would guess so, but I don't know enough of the nitty gritty technical details of how all that regex stuff works under the hood to give you an authoritative answer on that.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...