Splunk Search

Why is my regular expression not working in the rex command? It has been tested as a standalone regex outside of splunk

oliverj
Communicator

I am attempting to parse a solaris log file into key/value pairs. The log is:

pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

The result I am looking for will be:

Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob

My testing shows that the expression [\>\:]*\s+(.*?)\:?\s\<(.+?)\> should work.
http://regexr.com/3fatg

In Splunk, i put this regular expression into a search that returned the log in question.

mysearch | rex field=_raw "[\>\:]*\s+(.*?)\:?\s\<(.+?)\>"

It returned an error:

Error in rex command. The regex does
not extract anything. It should
specify at least one named group.

Can you help me turn this into an actual key/value pair list of results?

1 Solution

woodcock
Esteemed Legend

There is no way to do KVP matching with rex (yes, I tested the _KEY_1) but you can easily do it if you put it in transfoms.conf like this:

  • REGEX and the FORMAT attribute:
    • Name-capturing groups in the REGEX are extracted directly to fields. This means that you do not need to specify the FORMAT attribute for simple field extraction cases (see the description of FORMAT, below).
    • If the REGEX extracts both the field name and its corresponding field value, you can use the following special capturing groups if you want to skip specifying the mapping in FORMAT: KEY, VAL.
    • For example, the following are equivalent:
    • Using FORMAT: * REGEX = ([a-z]+)=([a-z]+) * FORMAT = $1::$2
    • Without using FORMAT * REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    • When using either of the above formats, in a search-time extraction, the regex will continue to match against the source text, extracting as many fields as can be identified in the source text.

So for you, it is like this:

In props.conf:

[MyFunkySourcetype]
TRANSFORMS-MyFunkyKVP = MyFunkyKVP

In transforms.conf:

[MyFunkyKVP]
REGEX = [\>\:]*\s+(.*?)\:?\s\<(.+?)\>
FORMAT = $1::$2

View solution in original post

woodcock
Esteemed Legend

There is no way to do KVP matching with rex (yes, I tested the _KEY_1) but you can easily do it if you put it in transfoms.conf like this:

  • REGEX and the FORMAT attribute:
    • Name-capturing groups in the REGEX are extracted directly to fields. This means that you do not need to specify the FORMAT attribute for simple field extraction cases (see the description of FORMAT, below).
    • If the REGEX extracts both the field name and its corresponding field value, you can use the following special capturing groups if you want to skip specifying the mapping in FORMAT: KEY, VAL.
    • For example, the following are equivalent:
    • Using FORMAT: * REGEX = ([a-z]+)=([a-z]+) * FORMAT = $1::$2
    • Without using FORMAT * REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    • When using either of the above formats, in a search-time extraction, the regex will continue to match against the source text, extracting as many fields as can be identified in the source text.

So for you, it is like this:

In props.conf:

[MyFunkySourcetype]
TRANSFORMS-MyFunkyKVP = MyFunkyKVP

In transforms.conf:

[MyFunkyKVP]
REGEX = [\>\:]*\s+(.*?)\:?\s\<(.+?)\>
FORMAT = $1::$2

oliverj
Communicator

Event type has been defined as "foo". All configuration taking place in etc/system/local
Tested:
1)
props.conf

[foo]
EXTRACT-MyFunkyKVP = [\>\:]*\s+(?<_KEY_1>.*?)\:?\s\<(?<_KEY_2>.+?)\>

Nothing in transforms.conf.

2)
props.conf

 [foo]
 TRANSFORMS-MyFunkyKVP = MyFunkyKVP

transforms.conf

 [MyFunkyKVP]
 REGEX = [\>\:]*\s+(.*?)\:?\s\<(.+?)\>
 FORMAT = $1::$2

Neither way seems to generate any result (Searching in verbose mode)

btool list against my props and transforms make it look like the conf files are applying against sourcetype:foo

0 Karma

woodcock
Esteemed Legend

Switch TRANSFORMS- to REPORT- to make it apply to ALL events (indexed in the past and in the future) at search-time by deploying on the Search Head. The way that you have it now will only apply to events at index-time (i.e. events indexed after you deploy the new configurations and restart splunkd on the indexers).

0 Karma

oliverj
Communicator

Nevermind! my regular expression didn't account for any timestamps or other headers (im very new to regex stuff), only the body of the message.
I edited an event to remove header, and it did some extractions. So, i know that the REPORT- is indeed working. Thank you!

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You have to give the field a name in your capture group..

Add (?<FIELDNAME>) to your capture group and it will work in Splunk

try something like this

mysearch | rex field=_raw "[\>\:]*\s+(?<Field1>.*?)\:?\s\<(?<Field2>.+?)\>"

badr_boukari
Explorer

Hey, 

Actually i am in the same problem, and i tested this technic, it works pretty good. 

 

It's the right response : 

index=pan_logs sourcetype=pan:system event_id="auth-fail" | rex field=description "user\s'(?<user>\w+)'\."

you can see exactely how in this video : https://www.youtube.com/watch?v=ppSxpzK2sj8&ab_channel=Splunk%26MachineLearning

 

Thanks to close this discussion! 

Good luck!

0 Karma

oliverj
Communicator

It runs, but no matches.
And when I put it into "Field Extractor", the Field1 and Field2 tabs are empty as well.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...