Splunk Search

Extract key-value pairs while ignoring "header" data using regex.

oliverj
Communicator

I have a regular expression that works on part of my data.
Given the log entry:

pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

i can use the regular expression: [\>\:]*\s+(.*?)\:?\s\<(.+?)\> and get the result I am looking for. (http://regexr.com/3fatg)

Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob

Unfortunately, when I was building this regular expression, I was ignoring a vital part of the log -- the first part.
The log actually looks like this:

Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

My extraction no longer works right -- it is thrown off by the first part. (http://regexr.com/3fbod)
How would I exclude the beginning information from this log file?

**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

I think I need to start my search after the last occurrence of a ] (right before pam_vas) but I cant figure out how to exclude that.

0 Karma
1 Solution

DalJeanis
Legend

Looks like this might work

\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>

on that site (regexr.com), it would be like this

\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>

View solution in original post

0 Karma

woodcock
Esteemed Legend

If the only thing that you must do is to skip past all end-square-brackets ( ] ), then you need a leading postitive-lookahead that specifies that everything until the end must contain anything EXCEPT that character. Try this RegEx:

(?=[^\]]*$)[\>\:]*\s+(.*?)\:?\s\<(.+?)\>

oliverj
Communicator

This one works as well, except for capturing the pam_vas text. But you call that out, of course, as it does not fit the others and the ] standard item. It is an odd variable.

0 Karma

woodcock
Esteemed Legend

So does this provide the solution or not?

0 Karma

oliverj
Communicator

It does -- im not quite sure what to when more than one valid answer happens, though.
I can only accept one.

0 Karma

woodcock
Esteemed Legend

IMHO, you should always up-vote correct answers and then select the BEST one by clicking Accept.

0 Karma

oliverj
Communicator

Upvoted! I will keep that in mind for next time.

0 Karma

DalJeanis
Legend

Looks like this might work

\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>

on that site (regexr.com), it would be like this

\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
0 Karma

DalJeanis
Legend

I designed this regex by building up from the right. The only decent /clear/permanent boundary was the value in angle brackets, so I started with

   \<(?<value1>[^\>]+)\>

That works to grab anything in angle brackets, NOT including other angle brackets.

Then I wanted to extend back to grab the colon, if any, getting this

(?:\:?\s\<)(?<value1>[^\>]+)\>

The "?:" was because we didn't want to capture that group, but I really wanted to think of it as a group.
Next, we had to deal with that last key field -- "Access cont(upn)" -- having a space and parenthesis in it.
Reviewing the rest of the key fields, I ended up deciding that the characters really could be anything but a colon or an angle bracket, getting this.

(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>

Note that excluding the colon also made sure that pam_vas would not be grabbed. That regex was grabbing everything I wanted, but also grabbing one space before the key field. So the final version became this.

\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>
0 Karma

oliverj
Communicator

I paste these into a site that explains the entire regex to me, and it just overwhelms me with what people can do with this tool. This looks like it will work -- testing now.

0 Karma

oliverj
Communicator

Given a solaris BSM authentication log, I was able to extract key/value pairs using the following:

transforms.conf

[MyKVP]
REGEX = \s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
FORMAT = $1::$2

props.conf

[sol_bsm] #my sourcetype in this test
REPORT-MyKVP = MyKVP

Thank you!

0 Karma

DalJeanis
Legend

You are quite welcome.

Yes, a complex regex still often looks like gobbledy gook to me, and understanding what changes need to be made to use it in a .conf file instead of in a search is an adventure. This was a chance to explore positive and negative lookaheads, but I ended up not requiring them to meet your needs.

0 Karma

DalJeanis
Legend

Can you provide a couple more examples? Specifically, do all relevant log events contain "pam_vas:" or are there other items that potentially appear there?

0 Karma

oliverj
Communicator

At this time, all logs seem to have pam_vas. I cant be completely sure they will all have it though.
The format DOES seem consistent -- the "] " (not "] [") seems to be a good breaker (right before pam_vas)

0 Karma

DalJeanis
Legend

I ended up focusing in on the angle brackets as the only "fixed" item, and from there it expanded pretty easily to what you needed.

0 Karma

somesoni2
Revered Legend

Also, are the fields name and their order always the same (authentication, user account etc)?

0 Karma

oliverj
Communicator

Although the format stays the same, the log content may change. i need the "pairs" to be generic.
Edit: Looking at more logs, they all seem to be the same. I do hate to hard-code the key though, just in case things update.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...