Solved: How to execute searches successfully for multiline...

pepBR · ‎02-10-2016

I am facing a problem and I need some advice/help. I am sorry if it sounds silly but I am new to Splunk and couldn't find an answer so far.

I am a regular Splunk user with no access to the server at all, nor configurations etc. The configuration cannot be changed and I am supposed to take advantage of its stats graphs as is.

I have built a script that writes log entries to a syslog, which is captured by Splunk. These log entries seems to be bigger than supported, thus it's broken in several lines.

The problem is that the log is truncated no matter where in the string and I have some keys truncated right in the middle. For example: I want to search for the key "this_is_my_key", but it's not found as it's split in 2 log entries. The first line ends with "this_is_m" and the second starts with "y_key". Therefore "this_is_my_key" is not found.

Is there a way to perform this search successfully?

Thank you guys.

Jeremiah · ‎02-10-2016

Really, the solution here is to look at your syslog config and the splunk config and correct the truncation issue. You should send the events over TCP instead of UDP syslog if you can, because you can send a longer event. Also, make sure you are not truncating the events yourself by inserting newlines in your syslog data.

However, if you want to work around the problem, it sounds like you do have control over the script that is writing the data to syslog. What I would suggest you do is actually break up your single event into multiple events that will not be truncated, and record a session id in the event so you know which events belong together. First, to figure out your event size limit do something like:

sourcetype=your_sourcetype | eval length=len(_raw) | stats max(length)

Then in your script, do something like this (I'm making these extremely short, but you get the idea):

<timestamp> id=1 key1=value1 key2=value2 key3=value3
<timestamp> id=1 key4=value4 key5=value5 key6=value6

And so on. You then use the id field to glue your events back together. Then perform a search one of two ways. Either, join all your data by id using stats:

sourcetype=my_sourcetype | stats avg(key1) avg(key4) by id
sourcetype=my_sourcetype | stats values(*) AS * by id

Or use a subsearch to find specific events (you don't have to search in field values of course).

sourcetype=my_sourcetype [sourcetype=my_sourcetype key4=value4 | fields id]

Of course if you don't need all of the fields, you can just do a normal search. But at least you'll avoid having your data split.

View solution in original post

Jeremiah · ‎02-10-2016

Really, the solution here is to look at your syslog config and the splunk config and correct the truncation issue. You should send the events over TCP instead of UDP syslog if you can, because you can send a longer event. Also, make sure you are not truncating the events yourself by inserting newlines in your syslog data.

However, if you want to work around the problem, it sounds like you do have control over the script that is writing the data to syslog. What I would suggest you do is actually break up your single event into multiple events that will not be truncated, and record a session id in the event so you know which events belong together. First, to figure out your event size limit do something like:

sourcetype=your_sourcetype | eval length=len(_raw) | stats max(length)

Then in your script, do something like this (I'm making these extremely short, but you get the idea):

<timestamp> id=1 key1=value1 key2=value2 key3=value3
<timestamp> id=1 key4=value4 key5=value5 key6=value6

And so on. You then use the id field to glue your events back together. Then perform a search one of two ways. Either, join all your data by id using stats:

sourcetype=my_sourcetype | stats avg(key1) avg(key4) by id
sourcetype=my_sourcetype | stats values(*) AS * by id

Or use a subsearch to find specific events (you don't have to search in field values of course).

sourcetype=my_sourcetype [sourcetype=my_sourcetype key4=value4 | fields id]

Of course if you don't need all of the fields, you can just do a normal search. But at least you'll avoid having your data split.

pepBR · ‎02-11-2016

Thanks for the advice, I am working on the script and using "by _time" to group the results. Still working on the script and will have to wait until it goes to production to perform a real test. Will keep you posted.

jplumsdaine22 · ‎02-11-2016

Great solution as always @Jeremiah - you should make this an answer

Jeremiah · ‎02-10-2016

Really, the solution here is to look at your syslog config and the splunk config and correct the truncation issue. You should send the events over TCP instead of UDP syslog if you can, because you can send a longer event. Also, make sure you are not truncating the events yourself by inserting newlines in your syslog data.

However, if you want to work around the problem, it sounds like you do have control over the script that is writing the data to syslog. What I would suggest you do is actually break up your single event into multiple events that will not be truncated, and record a session id in the event so you know which events belong together. First, to figure out your event size limit do something like:

sourcetype=your_sourcetype | eval length=len(_raw) | stats max(length)

Then in your script, do something like this (I'm making these extremely short, but you get the idea):

<timestamp> id=1 key1=value1 key2=value2 key3=value3
<timestamp> id=1 key4=value4 key5=value5 key6=value6

And so on. You then use the id field to glue your events back together. Then perform a search one of two ways. Either, join all your data by id using stats:

sourcetype=my_sourcetype | stats avg(key1) avg(key4) by id
sourcetype=my_sourcetype | stats values(*) AS * by id

Or use a subsearch to find specific events (you don't have to search in field values of course).

sourcetype=my_sourcetype [sourcetype=my_sourcetype key4=value4 | fields id]

Of course if you don't need all of the fields, you can just do a normal search. But at least you'll avoid having your data split.

How to execute searches successfully for multiline events with truncated text?

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!