Splunk Search

How to execute searches successfully for multiline events with truncated text?

pepBR
Engager

I am facing a problem and I need some advice/help. I am sorry if it sounds silly but I am new to Splunk and couldn't find an answer so far.

I am a regular Splunk user with no access to the server at all, nor configurations etc. The configuration cannot be changed and I am supposed to take advantage of its stats graphs as is.

I have built a script that writes log entries to a syslog, which is captured by Splunk. These log entries seems to be bigger than supported, thus it's broken in several lines.

The problem is that the log is truncated no matter where in the string and I have some keys truncated right in the middle. For example: I want to search for the key "this_is_my_key", but it's not found as it's split in 2 log entries. The first line ends with "this_is_m" and the second starts with "y_key". Therefore "this_is_my_key" is not found.

Is there a way to perform this search successfully?

Thank you guys.

0 Karma
1 Solution

Jeremiah
Motivator

Really, the solution here is to look at your syslog config and the splunk config and correct the truncation issue. You should send the events over TCP instead of UDP syslog if you can, because you can send a longer event. Also, make sure you are not truncating the events yourself by inserting newlines in your syslog data.

However, if you want to work around the problem, it sounds like you do have control over the script that is writing the data to syslog. What I would suggest you do is actually break up your single event into multiple events that will not be truncated, and record a session id in the event so you know which events belong together. First, to figure out your event size limit do something like:

sourcetype=your_sourcetype | eval length=len(_raw) | stats max(length)

Then in your script, do something like this (I'm making these extremely short, but you get the idea):

<timestamp> id=1 key1=value1 key2=value2 key3=value3
<timestamp> id=1 key4=value4 key5=value5 key6=value6

And so on. You then use the id field to glue your events back together. Then perform a search one of two ways. Either, join all your data by id using stats:

sourcetype=my_sourcetype | stats avg(key1) avg(key4) by id
sourcetype=my_sourcetype | stats values(*) AS * by id

Or use a subsearch to find specific events (you don't have to search in field values of course).

sourcetype=my_sourcetype [sourcetype=my_sourcetype key4=value4 | fields id]

Of course if you don't need all of the fields, you can just do a normal search. But at least you'll avoid having your data split.

View solution in original post

Jeremiah
Motivator

Really, the solution here is to look at your syslog config and the splunk config and correct the truncation issue. You should send the events over TCP instead of UDP syslog if you can, because you can send a longer event. Also, make sure you are not truncating the events yourself by inserting newlines in your syslog data.

However, if you want to work around the problem, it sounds like you do have control over the script that is writing the data to syslog. What I would suggest you do is actually break up your single event into multiple events that will not be truncated, and record a session id in the event so you know which events belong together. First, to figure out your event size limit do something like:

sourcetype=your_sourcetype | eval length=len(_raw) | stats max(length)

Then in your script, do something like this (I'm making these extremely short, but you get the idea):

<timestamp> id=1 key1=value1 key2=value2 key3=value3
<timestamp> id=1 key4=value4 key5=value5 key6=value6

And so on. You then use the id field to glue your events back together. Then perform a search one of two ways. Either, join all your data by id using stats:

sourcetype=my_sourcetype | stats avg(key1) avg(key4) by id
sourcetype=my_sourcetype | stats values(*) AS * by id

Or use a subsearch to find specific events (you don't have to search in field values of course).

sourcetype=my_sourcetype [sourcetype=my_sourcetype key4=value4 | fields id]

Of course if you don't need all of the fields, you can just do a normal search. But at least you'll avoid having your data split.

pepBR
Engager

Thanks for the advice, I am working on the script and using "by _time" to group the results. Still working on the script and will have to wait until it goes to production to perform a real test. Will keep you posted.

0 Karma

jplumsdaine22
Influencer

Great solution as always @Jeremiah - you should make this an answer

0 Karma

Jeremiah
Motivator

Really, the solution here is to look at your syslog config and the splunk config and correct the truncation issue. You should send the events over TCP instead of UDP syslog if you can, because you can send a longer event. Also, make sure you are not truncating the events yourself by inserting newlines in your syslog data.

However, if you want to work around the problem, it sounds like you do have control over the script that is writing the data to syslog. What I would suggest you do is actually break up your single event into multiple events that will not be truncated, and record a session id in the event so you know which events belong together. First, to figure out your event size limit do something like:

sourcetype=your_sourcetype | eval length=len(_raw) | stats max(length)

Then in your script, do something like this (I'm making these extremely short, but you get the idea):

<timestamp> id=1 key1=value1 key2=value2 key3=value3
<timestamp> id=1 key4=value4 key5=value5 key6=value6

And so on. You then use the id field to glue your events back together. Then perform a search one of two ways. Either, join all your data by id using stats:

sourcetype=my_sourcetype | stats avg(key1) avg(key4) by id
sourcetype=my_sourcetype | stats values(*) AS * by id

Or use a subsearch to find specific events (you don't have to search in field values of course).

sourcetype=my_sourcetype [sourcetype=my_sourcetype key4=value4 | fields id]

Of course if you don't need all of the fields, you can just do a normal search. But at least you'll avoid having your data split.

Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...