Splunk Search

Extract key-value pairs while ignoring "header" data using regex.

oliverj
Communicator

I have a regular expression that works on part of my data.
Given the log entry:

pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

i can use the regular expression: [\>\:]*\s+(.*?)\:?\s\<(.+?)\> and get the result I am looking for. (http://regexr.com/3fatg)

Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob

Unfortunately, when I was building this regular expression, I was ignoring a vital part of the log -- the first part.
The log actually looks like this:

Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

My extraction no longer works right -- it is thrown off by the first part. (http://regexr.com/3fbod)
How would I exclude the beginning information from this log file?

**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

I think I need to start my search after the last occurrence of a ] (right before pam_vas) but I cant figure out how to exclude that.

0 Karma
1 Solution

DalJeanis
Legend

Looks like this might work

\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>

on that site (regexr.com), it would be like this

\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>

View solution in original post

0 Karma

woodcock
Esteemed Legend

If the only thing that you must do is to skip past all end-square-brackets ( ] ), then you need a leading postitive-lookahead that specifies that everything until the end must contain anything EXCEPT that character. Try this RegEx:

(?=[^\]]*$)[\>\:]*\s+(.*?)\:?\s\<(.+?)\>

oliverj
Communicator

This one works as well, except for capturing the pam_vas text. But you call that out, of course, as it does not fit the others and the ] standard item. It is an odd variable.

0 Karma

woodcock
Esteemed Legend

So does this provide the solution or not?

0 Karma

oliverj
Communicator

It does -- im not quite sure what to when more than one valid answer happens, though.
I can only accept one.

0 Karma

woodcock
Esteemed Legend

IMHO, you should always up-vote correct answers and then select the BEST one by clicking Accept.

0 Karma

oliverj
Communicator

Upvoted! I will keep that in mind for next time.

0 Karma

DalJeanis
Legend

Looks like this might work

\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>

on that site (regexr.com), it would be like this

\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
0 Karma

DalJeanis
Legend

I designed this regex by building up from the right. The only decent /clear/permanent boundary was the value in angle brackets, so I started with

   \<(?<value1>[^\>]+)\>

That works to grab anything in angle brackets, NOT including other angle brackets.

Then I wanted to extend back to grab the colon, if any, getting this

(?:\:?\s\<)(?<value1>[^\>]+)\>

The "?:" was because we didn't want to capture that group, but I really wanted to think of it as a group.
Next, we had to deal with that last key field -- "Access cont(upn)" -- having a space and parenthesis in it.
Reviewing the rest of the key fields, I ended up deciding that the characters really could be anything but a colon or an angle bracket, getting this.

(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>

Note that excluding the colon also made sure that pam_vas would not be grabbed. That regex was grabbing everything I wanted, but also grabbing one space before the key field. So the final version became this.

\s+(?<key1>[^\:\<\>]+)(?:\:?\s\<)(?<value1>[^\>]+)\>
0 Karma

oliverj
Communicator

I paste these into a site that explains the entire regex to me, and it just overwhelms me with what people can do with this tool. This looks like it will work -- testing now.

0 Karma

oliverj
Communicator

Given a solaris BSM authentication log, I was able to extract key/value pairs using the following:

transforms.conf

[MyKVP]
REGEX = \s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
FORMAT = $1::$2

props.conf

[sol_bsm] #my sourcetype in this test
REPORT-MyKVP = MyKVP

Thank you!

0 Karma

DalJeanis
Legend

You are quite welcome.

Yes, a complex regex still often looks like gobbledy gook to me, and understanding what changes need to be made to use it in a .conf file instead of in a search is an adventure. This was a chance to explore positive and negative lookaheads, but I ended up not requiring them to meet your needs.

0 Karma

DalJeanis
Legend

Can you provide a couple more examples? Specifically, do all relevant log events contain "pam_vas:" or are there other items that potentially appear there?

0 Karma

oliverj
Communicator

At this time, all logs seem to have pam_vas. I cant be completely sure they will all have it though.
The format DOES seem consistent -- the "] " (not "] [") seems to be a good breaker (right before pam_vas)

0 Karma

DalJeanis
Legend

I ended up focusing in on the angle brackets as the only "fixed" item, and from there it expanded pretty easily to what you needed.

0 Karma

somesoni2
Revered Legend

Also, are the fields name and their order always the same (authentication, user account etc)?

0 Karma

oliverj
Communicator

Although the format stays the same, the log content may change. i need the "pairs" to be generic.
Edit: Looking at more logs, they all seem to be the same. I do hate to hard-code the key though, just in case things update.

0 Karma
Get Updates on the Splunk Community!

Splunk APM: New Product Features + Community Office Hours Recap!

Howdy Splunk Community! Over the past few months, we’ve had a lot going on in the world of Splunk Application ...

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...