Hi Splunkers & Splunkettes,
I am currently defining some sourcetypes for some db2 SMF logs and have finally got the field extractions working the way I want to via regex. To give you a snippet example:
EXTRACT-db2_header = (?m)^0(?<primauth>.{8})\s(?<connect>.{8})\s
On this event:
0=======================================================================================================
PRIMAUTH CONNECT INSTANCE END_USER WS_NAME TRANSACT
ORIGAUTH CORRNAME CONNTYPE RECORD TIME DESTNO IFC DESCRIPTION DATA
PLANNAME CORRNMBR TCB CPU TIME ID
-------- -------- ------------ -------------------------- --- -------------- --------------------------
0A1B2C3 SERVER X'123456789012' A12345 ABCD123 SQLA.exe
0Z9Y8X7 N/A REMOTE M 15:46:05 1234567890 140 Audit Auth Failures
0DISTSERV 'BLANK'
Gives me the following fields and their values:
primauth = "A1B2C3"
connect = "SERVER"
Now this appears to work fine. When I apply it to a larger sample size I get the following results for the primauth field in the field picker:
Values # %
-------------------------
A1B2C3 29,270 99.996%
Z9Y8X7 1 0.003%
Which is excellent because I need to find all instances where the primauth ISN'T 'A1B2C3'. HOWEVER, when I click on the value for 'Z9Y8X7' to add it to the search query, I get no results, despite Splunk telling me there is one value in my data set??? I've tried both:
sourcetype="db2_header" primauth="Z9Y8X7"
sourcetype="db2_header" Z9Y8X7
But both come up with no matches... am I missing something here? I realise that it's a stiatistically insignificant value, but so is a needle in a haystack and that's Splunk's bread & butter.
EDIT: To make matters a little weirder, I DO get the expected values when I enter this:
sourcetype="db2_header" NOT primauth="A1B2C3"
Thanks in advance 🙂
This should help clear some of the confusion, and explain why you're seeing the behaviour you're seeing. http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/
This should help clear some of the confusion, and explain why you're seeing the behaviour you're seeing. http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/
Thanks Ayn, good to know I was on the right path.
Right, I think I've made a little headway (I've gotta get out of the habit of asking questions then answering them myself 10 minutes later).
It has to do with the way Splunk performs it's searching. The search function appears to work only from non alpha-numeric boundaries. Even though I've specified my regex to ignore the leading zero in the value for primauth, this doesn't fly for the search function as it will always try to match a search with raw data so while a seach for:
primauth="Z9Y8X7"
won't work, a search for:
primauth="*Z9Y8X7"
WILL work as the search function needs to deal with the leading zero, even if the rex doesn't.
To complicate matters, any NOT search I declare seems to take the primauth value AFTER regex extraction, hence the:
sourcetype="db2_header" NOT primauth="A1B2C3"
DOES work the way you'd expect.
Not particularly intuitive, but good to know & understand. Hope this helps someone out 🙂