Getting Data In

Parsing SIP multiline, multiformat events

inglisn
Path Finder

Hi, I'm trying to parse some logs generated by Broadsoft SIP servers. The log formats follow a general pattern but the detail can vary from event to event and field meanings can be context-sensitive.

The events are multiline broken by datetime string and the first portion is pipe-separated. The fields here can differ in number and meaning, and if I use DELIMS on the pipe character it works except for the last field which flows into the remainder of the event.

The first thing I'd like to do is stop the delims at a defined point which seems to be a newline character. The following transform using "| or newline" doesn't work. If I make it "| or tab", it works better for the first line but also matches unwanted fields in the remainder of the event (many of which start with tab).

[transform-bsft-xslog-test1]
# delims are pipe OR newline.
DELIMS = "|
"
FIELDS = "szDateTime" logLevel logType sipField1 sipField2 sipField3

Event sample:

2012.06.21 02:48:15:155 EST | Info       | CallP | SIP Endpoint | +155512345678 | Service Delivery | localHost1234:5678

        Processing Event: com.broadsoft.events.sip.SipReferEvent

2012.06.21 02:48:15:157 EST | Info       | Accounting

        SERVICE INVOCATION ACCOUNTING EVENT
        Time Stamp: Thu Jun 21 02:48:15 EST 2012 (1340264895157)
        Accounting ID: [id]
        Service Name: Call Transfer
        Related Accounting ID: [id]


2012.06.21 02:48:14:773 EST | Info       | SipMedia | +155512345678 | localHost1234:5678

        udp 391 Bytes IN from 10.10.10.10:5060
SIP/2.0 200 OK
[various amounts (10 - 30+ lines) of SIP information trimmed]
1 Solution

bwooden
Splunk Employee
Splunk Employee

I think there are several options here as you seem to have variable number of varying fields in each event. One solution is to use a combination of props & transforms definitions to pull out major/high-level extractions on first pass and then pull out additional fields in second pass.

You could have a props.conf like this to efficiently break events, extract timestamp, and call the field extraction pieces::

[sipSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+\s+\|
TIME_PREFIX=^
TIMESTAMP_LOOKAHEAD=28
TIME_FORMAT=%Y.%m.%d %H:%M:%S:%3N %z
KV_MODE=none
REPORT-field_passes=pass_one, pass_two, pass_three

and a corresponding transforms.conf like this to first pull out static known fields (pass_one) and then pull out colon separated values (pass_two) and finally add additional passes against sipFields (extracted in pass_one) to handle anything else

[pass_one]
REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t\n\r]+([^\|\t\n\r]+)(.*)?
#REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t]+([^\|\t\n\r]+)(?:[\s\|\t]+)?(.*)?
FORMAT=szDateTime::$1 logLevel::$2 logType::$3 sipFields::$4

[pass_two]
SOURCE_KEY=sipFields
REGEX=([^\:\t\n\r\|\d]+)\:([^\t\n\r\|]+)
FORMAT=$1::$2 
MV_ADD=true    

[pass_three]
# another iteration for variable number of pipe separated values, etc

View solution in original post

bwooden
Splunk Employee
Splunk Employee

I think there are several options here as you seem to have variable number of varying fields in each event. One solution is to use a combination of props & transforms definitions to pull out major/high-level extractions on first pass and then pull out additional fields in second pass.

You could have a props.conf like this to efficiently break events, extract timestamp, and call the field extraction pieces::

[sipSourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+\s+\|
TIME_PREFIX=^
TIMESTAMP_LOOKAHEAD=28
TIME_FORMAT=%Y.%m.%d %H:%M:%S:%3N %z
KV_MODE=none
REPORT-field_passes=pass_one, pass_two, pass_three

and a corresponding transforms.conf like this to first pull out static known fields (pass_one) and then pull out colon separated values (pass_two) and finally add additional passes against sipFields (extracted in pass_one) to handle anything else

[pass_one]
REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t\n\r]+([^\|\t\n\r]+)(.*)?
#REGEX=^(\d{4}\.\d{2}\.\d{2}\s+\d{2}:\d{2}:\d{2}:\d{3}\s+\w+)[\s\|\t]+([^\|\t\n\r]+)[\s\|\t]+([^\|\t\n\r]+)(?:[\s\|\t]+)?(.*)?
FORMAT=szDateTime::$1 logLevel::$2 logType::$3 sipFields::$4

[pass_two]
SOURCE_KEY=sipFields
REGEX=([^\:\t\n\r\|\d]+)\:([^\t\n\r\|]+)
FORMAT=$1::$2 
MV_ADD=true    

[pass_three]
# another iteration for variable number of pipe separated values, etc

inglisn
Path Finder

Excellent, thanks.

I came across a "2-phase" similar strategy in a question about FIX logs. Its a really useful way of working with ugly log formats. I can pull out other values with rex in the search command.

You also resolved some other issues on linebreaking I was having.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...