I am currently defining some sourcetypes for some db2 SMF logs (oh joy). Luckily, the fields are well defined and are fixed length, so extracting the values at the correct boundaries is not an issue. As an example, using this regex:
rex "(?m)^0(?<primauth>.{8})\s(?<connect>.{8})\s(?<instance>\S+)\s(?<end_user>.{16})\s(?<ws_name>.{29})\s(?<transact>.{50})\s\n0(?<origauth>.{8})\s(?<corrname>.{8})\s(?<conntype>.{12})\s"
On this event:
0=======================================================================================================
PRIMAUTH CONNECT INSTANCE END_USER WS_NAME TRANSACT
ORIGAUTH CORRNAME CONNTYPE RECORD TIME DESTNO IFC DESCRIPTION DATA
PLANNAME CORRNMBR TCB CPU TIME ID
-------- -------- ------------ -------------------------- --- -------------- --------------------------
0A1B2C3 SERVER X'123456789012' A12345 ABCD123 SQLA.exe
0Z9Y8X7 N/A REMOTE M 15:46:05 1234567890 140 Audit Auth Failures
0DISTSERV 'BLANK'
Gives me the following fields and their values (the quotes are mine obviously):
primauth = "A1B2C3 "
connect = "SERVER "
instance = "X'123456789012'"
end_user = "A12345 "
ws_name = "ABCD123 "
transact = "SQLA.exe "
...
You get the idea, and you probably see where I'm going with this.
How can I maintain the fixed field widths for extractions, but disregard trailing spaces in the values (something that you might do with "chomp" in perl). I would rather not have to append a wildcard every time I do a search for a specific field value just because I'm unsure how many spaces there are.
I very much suspect regex is the solution to my problems here but I've been scratching my head about this for a while.
Hoping someone can help 🙂
Ahhh... so once again I answer my own question.
Turns out that specifying the regular expression in your search query will capture spaces in the extracted values, but putting that same regex in your props.conf or transforms.conf will "chomp" the values for you while keeping your fixed-width field definitions.
props.conf:
[db2_system_parameters]
KV_MODE = none
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^0==
TIME_FORMAT = %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 800
REPORT-header_fields = db2_header_fields
transforms.conf:
[db2_header_fields]
REGEX = (?m)^0(?<primauth>.{8})\s(?<connect>.{8}) ...
Hope this helps someone else, as it wasn't behaviour I was aware of 🙂
beware that that solution works until you have AT LEAST one blank between two fields. It you have adjacent fields, with no space in between, then Splunk has problems in handling the two fields as two different strings.
I had that problem in a case where the log had fixed lenght fields but with adjacent fields(no separators).
For instance, envent il something like
xxxxxyyyyyyyyy....... where field1 are the first 5 chars, then field2 and so on.
In that case, fields extractions works, but if you try a search line "field1=xxxxx", that Splunk returns no result. To have some result you should write "field1=xxxxx*".
This is because Splunk indexes just one string made of "xxxxxyyyyyyy...." And if you have to search for field2, it's event worst: you must write "field2=yyyyyyyyy". As you can imagine, performance implications are terrible!
The solution I tried (and it worked perfectly) was to use a transform during the parsing phase and write the _raw adding a separator (. or | or a blank) between fields. Then Splunk indexes correctly the fields.
Marco
@Marcoscala
U wrote
"The solution I tried (and it worked perfectly) was to use a transform during the parsing phase and write the _raw adding a separator (. or | or a blank) between fields. Then Splunk indexes correctly the fields."
Can you please guide me how to do this? ..I am new in splunk.
Ahhh... so once again I answer my own question.
Turns out that specifying the regular expression in your search query will capture spaces in the extracted values, but putting that same regex in your props.conf or transforms.conf will "chomp" the values for you while keeping your fixed-width field definitions.
props.conf:
[db2_system_parameters]
KV_MODE = none
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^0==
TIME_FORMAT = %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 800
REPORT-header_fields = db2_header_fields
transforms.conf:
[db2_header_fields]
REGEX = (?m)^0(?<primauth>.{8})\s(?<connect>.{8}) ...
Hope this helps someone else, as it wasn't behaviour I was aware of 🙂
Why not just use a [^\s]+ to capture everything up until a space character? you can then use a \s+ to move ahead to the start of the next character set.
That was how I approached it initially, but unfortunately not all fields are mandatory so the fixed-width definitions have to be kept for data integrity. Thanks for the suggestion though 🙂