Splunk Search

Fixed-width field extraction but removing trailing spaces

rturk
Builder

I am currently defining some sourcetypes for some db2 SMF logs (oh joy). Luckily, the fields are well defined and are fixed length, so extracting the values at the correct boundaries is not an issue. As an example, using this regex:

rex "(?m)^0(?<primauth>.{8})\s(?<connect>.{8})\s(?<instance>\S+)\s(?<end_user>.{16})\s(?<ws_name>.{29})\s(?<transact>.{50})\s\n0(?<origauth>.{8})\s(?<corrname>.{8})\s(?<conntype>.{12})\s"

On this event:

0=======================================================================================================
 PRIMAUTH CONNECT  INSTANCE       END_USER      WS_NAME                       TRANSACT                                               
 ORIGAUTH CORRNAME CONNTYPE       RECORD TIME   DESTNO     IFC DESCRIPTION    DATA                                                   
 PLANNAME CORRNMBR                TCB CPU TIME             ID                                                                        
 -------- -------- ------------ -------------------------- --- -------------- --------------------------
0A1B2C3   SERVER   X'123456789012' A12345           ABCD123                       SQLA.exe                                           
0Z9Y8X7   N/A      REMOTE     M 15:46:05        1234567890 140 Audit Auth Failures                                                   
0DISTSERV 'BLANK' 

Gives me the following fields and their values (the quotes are mine obviously):

primauth = "A1B2C3  "
connect = "SERVER  "
instance = "X'123456789012'"
end_user = "A12345          "
ws_name = "ABCD123                      "
transact = "SQLA.exe                                          "
...

You get the idea, and you probably see where I'm going with this.

How can I maintain the fixed field widths for extractions, but disregard trailing spaces in the values (something that you might do with "chomp" in perl). I would rather not have to append a wildcard every time I do a search for a specific field value just because I'm unsure how many spaces there are.

I very much suspect regex is the solution to my problems here but I've been scratching my head about this for a while.

Hoping someone can help 🙂

Tags (2)
0 Karma
1 Solution

rturk
Builder

Ahhh... so once again I answer my own question.

Turns out that specifying the regular expression in your search query will capture spaces in the extracted values, but putting that same regex in your props.conf or transforms.conf will "chomp" the values for you while keeping your fixed-width field definitions.

props.conf:

[db2_system_parameters]
KV_MODE                 = none
SHOULD_LINEMERGE        = true
BREAK_ONLY_BEFORE       = ^0==
TIME_FORMAT             = %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 800
REPORT-header_fields    = db2_header_fields

transforms.conf:

[db2_header_fields]
REGEX = (?m)^0(?<primauth>.{8})\s(?<connect>.{8}) ...

Hope this helps someone else, as it wasn't behaviour I was aware of 🙂

View solution in original post

marcoscala
Builder

beware that that solution works until you have AT LEAST one blank between two fields. It you have adjacent fields, with no space in between, then Splunk has problems in handling the two fields as two different strings.

I had that problem in a case where the log had fixed lenght fields but with adjacent fields(no separators).
For instance, envent il something like

xxxxxyyyyyyyyy....... where field1 are the first 5 chars, then field2 and so on.

In that case, fields extractions works, but if you try a search line "field1=xxxxx", that Splunk returns no result. To have some result you should write "field1=xxxxx*".
This is because Splunk indexes just one string made of "xxxxxyyyyyyy...." And if you have to search for field2, it's event worst: you must write "field2=yyyyyyyyy". As you can imagine, performance implications are terrible!

The solution I tried (and it worked perfectly) was to use a transform during the parsing phase and write the _raw adding a separator (. or | or a blank) between fields. Then Splunk indexes correctly the fields.

Marco

0 Karma

Saurabhsood
New Member

@Marcoscala
U wrote
"The solution I tried (and it worked perfectly) was to use a transform during the parsing phase and write the _raw adding a separator (. or | or a blank) between fields. Then Splunk indexes correctly the fields."

Can you please guide me how to do this? ..I am new in splunk.

0 Karma

rturk
Builder

Ahhh... so once again I answer my own question.

Turns out that specifying the regular expression in your search query will capture spaces in the extracted values, but putting that same regex in your props.conf or transforms.conf will "chomp" the values for you while keeping your fixed-width field definitions.

props.conf:

[db2_system_parameters]
KV_MODE                 = none
SHOULD_LINEMERGE        = true
BREAK_ONLY_BEFORE       = ^0==
TIME_FORMAT             = %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 800
REPORT-header_fields    = db2_header_fields

transforms.conf:

[db2_header_fields]
REGEX = (?m)^0(?<primauth>.{8})\s(?<connect>.{8}) ...

Hope this helps someone else, as it wasn't behaviour I was aware of 🙂

Drainy
Champion

Why not just use a [^\s]+ to capture everything up until a space character? you can then use a \s+ to move ahead to the start of the next character set.

rturk
Builder

That was how I approached it initially, but unfortunately not all fields are mandatory so the fixed-width definitions have to be kept for data integrity. Thanks for the suggestion though 🙂

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...