Solved: Multline events with pauses between lines

heathm · ‎12-09-2010

I'm consuming a qa test log that has a fairly erratic format, but I was able to identify a line breaker regex to group them into events the way I want. It works perfectly if it consumes the entire file at once, but under normal conditions, a "single" event contains multiple messages that are written over a span of several minutes or longer as the qa test steps through its tasks. In the example below, that total time span is about 20 minutes. Under these conditions, Splunk is indexing each message as its own event no matter what I try.

Here's a sample event (snipped for brevity):

2010-Dec-09 16:19 - loading suite testKprSsl.rb
startup logic
switching fbc from http to https
got fbc.cfg
deploying new fbc.cfg updated with https
Loaded suite testKprSsl
Started
TestKprPrivileges: 
  test_100_ConfigureFbc:                

.: (13.903755)
  test_101_UpgradeFbc:              

.: (13.932528)
  test_102_QueryFbc:                    

.: (20.092743)
(snip)
Finished in 1248.147737289 seconds.
78 tests, 264 assertions, 2 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
97.4359% passed

Here is the props.conf entry I used that worked fine when indexing complete files - it looks for the particular date format at the start of a new line:

[watcherresults]
LINE_BREAKER = ([\r\n])+(\d+)-[A-Za-z]{3}-(\d+\s)
TIME_PREFIX = ^
TIME_FORMAT = %Y-%b-%d %H:%M
TRUNCATE = 0

I have also tried using different combos of BREAK_ONLY_BEFORE regexes, SHOULD_LINEMERGE, and BREAK_ONLY_BEFORE_DATE, but the messages are always showing up as multiple events.

The indexed file is written on a remote server and forwarded via LWF to a Windows 7 system running version 4.1.5, build 85165. Is there a way to keep these messages together as a single event or to reassemble them after the fact?

mitch_1 · ‎12-10-2010

Yes, usually when reading from a file there is a presumption that one "event" is written in a chunk. So if Splunk is reading a file and it notices that its size doesn't change for awhile it will take that as an event boundary.

Options that I can think of off the top of my head:

You could try increasing "time_before_close" in the inputs.conf stanza (note for old-timers: this setting used to be a global in limits.conf once upon a time) Its value is in seconds and defaults to 3. If you know for example that your unit test stuff will never leave the file idle for more than 60 seconds in the middle of an event you could try "time_before_close = 60"

If you set it too high you might see some delay between when the last event happens and when it is searchable, though. If Splunk is at the end of the file it needs something to tell it to stop waiting for a new LINE_BREAKER and decide that we now have a whole event.
If you don't mind doing a bit of programming, you could feed the data in over a scripted input instead of reading directly from a file. Then you could feed each event to splunk in a single chunk and not worry about splunk seeing half an event.

View solution in original post

Jordan_Brough · ‎07-04-2012

mitch-1's solution of time_before_close=120 lessened the problem for us but didn't solve it.

We have multiline events with timestamps on each line and every 3 seconds (the default time_before_close) the universal forwarder would cut off the most recent event and send the data to the indexer, splitting a single event into 2 events (or even 3 sometimes if there was a pause > 3s). We tried both LINE_BREAKER as well as BREAK_ONLY_BEFORE to no avail.

time_before_close=120 helped a little as it made it so that incorrectly-split events happened every 120 seconds instead of every 3 seconds.

After working with Splunk support for about a month they confirmed that this is just how the universal forwarder works. The workarounds available were either 1) buffer the logs in memory until an "event" was complete and write the entire event to disk at once, or 2) user a heavy forwarder (apparently it has extra capabilities which can be configured to solve this problem). We ended up going with option 1 (buffering) and it solved the problem completely.

mitch_1 · ‎12-10-2010

Yes, usually when reading from a file there is a presumption that one "event" is written in a chunk. So if Splunk is reading a file and it notices that its size doesn't change for awhile it will take that as an event boundary.

Options that I can think of off the top of my head:

You could try increasing "time_before_close" in the inputs.conf stanza (note for old-timers: this setting used to be a global in limits.conf once upon a time) Its value is in seconds and defaults to 3. If you know for example that your unit test stuff will never leave the file idle for more than 60 seconds in the middle of an event you could try "time_before_close = 60"

If you set it too high you might see some delay between when the last event happens and when it is searchable, though. If Splunk is at the end of the file it needs something to tell it to stop waiting for a new LINE_BREAKER and decide that we now have a whole event.
If you don't mind doing a bit of programming, you could feed the data in over a scripted input instead of reading directly from a file. Then you could feed each event to splunk in a single chunk and not worry about splunk seeing half an event.

heathm · ‎12-14-2010

Increasing "time_before_close" was the key. It does add delay, but that's not a problem in this case.

gkanapathy · ‎12-10-2010

The problem could be with using a LWF. Do you have autoLB enabled? autoLB assumes that a single event is written to an output file atomically. If there is time such that the LWF can read part of the event, it might break the event. (One of the characteristics of LWF is that it does not parse event boundaries, so it uses this heuristic/assumption to know when it can break/load balance across indexers.) What happens if you change to heavy forwarders (and move the linebreak and timestamp config to the heavy forwarder as would be required)?

But you definitely should be able to reassemble broken events. You ought to be able to use nothing but a BREAK_ONLY_BEFORE.

Alternatively, if the events are large, your problem may simply be (in the case where SHOULD_LINEMERGE = false) that you need to increase TRUNCATE (default of 50k), or (in the case where SHOULD_LINEMERGE = true) that you need to increase MAX_EVENTS (default is 256). This actually I guess should be first thing you should check/try.

heathm · ‎12-14-2010

Thanks - I did end up switching to heavy forwarder so this was helpful, but ultimately I needed to increase "time_before_close" as suggested by mitch to get it to work reliably.

Multline events with pauses between lines

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life