Splunk Search

Field extraction at index, or use transforms?

Toups
Explorer

I am working with the following input and wanted some advice on how/where to specify the field extractions:

"\x00\x00\x00103700079  C9E840    13372786523      7137                210018  51730064  #850 1      000         "

I have documentation from the vendor specifying value lengths and definitions and we can perform most field extractions via individial regex field extractions, but we wanted to know if there is a better or more effecient method recommended.

For regerence, the field mapping table is listed below and have included samples for a couple of the current field extractions.

1-2 Time of day-hours
3-4 Time of day-minutes
5    Duration-hours
6-7  Duration-minutes
8   Duration-tenths of minutes
9   Condition code
10-13    Access code dialed
14-17    Access code used
18-32    Dialed number
33-42    Calling number
43-57    Account code
58-64    Authorization code
65-66    Space
67  FRL
68-70   Incoming circuit ID (hundreds, tens, units)
71-73    Outgoing circuit ID (hundreds, tens, units)
74 Feature flag
75-76 Attendant console
77-80 Incoming TAC
81-82 Node number
83-85 INS
86-88 IXC
89 BCC
90 MA-UUI
91 Resource flag
92-95 Packet count
96 TSC flag
97-100 Reserved
101 Carriage return (Not displayed)
102 Line feed (Not displayed)
103-105 Null (displayed as “\x00\x00\x00” at beginning of new line)

For example, to extract the duration hours, minutes, tenths of minutes we use the following regex:

"^.{16}(?<duration_hours>\d{1})" 
"^.{17}(?<duration_minutes>\d{2})" 
"^.{19}(?<duration_tenths_minutes>\d{1})" 
0 Karma
2 Solutions

ziegfried
Influencer

A single regular expression is IMO the most efficient way to extract the fields here. To get rid of the \x00 values in your events, you could adjust the LINE_BREAKER settings of your sourcetype:

props.conf:

[<your sourcetype>]
LINE_BREAKER=([\x00\r\n]+)
EXTRACT-fields=<the regex here>

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Most efficient would probably be a single search time REGEX extraction:

EXTRACT-fields = (?<hour>.{2})(?<min>.{2})(?<duration_h>.)(?<duration_m>.{2})(?<duration_mtenths>.{8})(?<cc>.)(?<accesscd_dialed>.{4})(?<accesscd_used>.{4})(?<num_dialed>.{15})(?<num_calling.{10})

And so on. That way, all fields come in in a single pass over the data. Note that with this particular data, you may run into some problems searching for particular fields by a specific value (if the value is pressed right up against adjacent fields with no white space). You can deal with those for selected fields if you're commonly searching on them by using index-time extractions, but again, selectively and only if you determine it's really necessary for that field (e.g., don't do it with the time fields, and probably not with the dialed number)

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Most efficient would probably be a single search time REGEX extraction:

EXTRACT-fields = (?<hour>.{2})(?<min>.{2})(?<duration_h>.)(?<duration_m>.{2})(?<duration_mtenths>.{8})(?<cc>.)(?<accesscd_dialed>.{4})(?<accesscd_used>.{4})(?<num_dialed>.{15})(?<num_calling.{10})

And so on. That way, all fields come in in a single pass over the data. Note that with this particular data, you may run into some problems searching for particular fields by a specific value (if the value is pressed right up against adjacent fields with no white space). You can deal with those for selected fields if you're commonly searching on them by using index-time extractions, but again, selectively and only if you determine it's really necessary for that field (e.g., don't do it with the time fields, and probably not with the dialed number)

0 Karma

Toups
Explorer

Thank you, I think this is the information we were looking for.
Your time and attention is greatly appreciated!

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Because if you're not searching for the specific values, indexing more fields will increase the size of the index, which can decrease performance for all searches. If you are searching rarely for specific values of fieldname, you can search with fieldname=*value* (vs fieldname=value) which will work but will be slower for that search only. If you are not searching for specific values, but reporting instead (e.g., stats count by number_dialed) then indexed fields are no better than search-time extracted ones.

0 Karma

Toups
Explorer

It sounds like index time extraction is best as many of the fields are adjancent.

Why do you recommend against items such as time or dialed number in the extraction at index? The target application with be a Call Detail Record index, and a sub-component of an event correlation system.

0 Karma

ziegfried
Influencer

A single regular expression is IMO the most efficient way to extract the fields here. To get rid of the \x00 values in your events, you could adjust the LINE_BREAKER settings of your sourcetype:

props.conf:

[<your sourcetype>]
LINE_BREAKER=([\x00\r\n]+)
EXTRACT-fields=<the regex here>
0 Karma

Toups
Explorer

The code:

LINE_BREAKER=([\x00\r\n]+)

Does not appear to be removing the "\x00\x00\x00" from the

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...