Getting Data In

Splunk not breaking events on line break properly

jcfergus
Engager

Ok, I'm at my wits' end here. I have an application log which produces events of the format:

DEBUG | 2012-02-16 11:01:30,683 [http-10.0.0.1-8443-Processor6] SystemFile  - field1=value1 timestamp=2012-02-16 11:01:30.679 CST   field2=value2   field3=value3   field4=value4   field5= field6=value6   field7=A field value with spaces in it  field8=
DEBUG | 2012-02-16 11:01:32,457 [http-10.0.0.1-8443-Processor10] SystemFile  - field1=value1    timestamp=2012-02-16 11:01:32,450 CST   field2=value2   field3= field4=value4   field5= field6=value6   field7=Another field with spaces in it  field8=value8

Basically tab-delimited name/value pairs, with nice neat newlines at the end of the lines (I've verified the line breaks and tabs in a hex editor, and all events are being written via the same log4j config). I -thought- I had it all being parsed just fine, but it appears that the index-time parsing is not always splitting the events on newlines, and I'll end up with two (or three, or four, or five) log lines in one event. They have different timestamps, so it's not that it's rolling them up into one (the above two events are a sanitzed example of two that got rolled together). I would suspect it's that the first one ends with an equals sign (no value), but there are plenty of events in the same log that look identical that get split properly. I'm stumped.

My props.conf for the log source looks like:

[MySourceType]
LINE_BREAKER = ([\r\n]+)
REPORT-tab-kv-manual = tab-kv-manual
KV_MODE = NONE
TIME_PREFIX = DEBUG
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
MAX_TIMESTAMP_LOOKAHEAD = 30

And my transforms.conf looks like:

[tab-kv-manual]
REGEX = (\t|- )([^=]+)=([^\t\n]*)
FORMAT = $2::$3
REPEAT_MATCH = true

Any suggestions?

0 Karma

thisissplunk
Builder

Did you ever figure this out? Having the same issue. Testing the explicit line breaker currently.

0 Karma

somesoni2
Revered Legend

What is your data format? Also, include "SHOULD_LINEMERGE=false" in props.conf along with LINE_BREAKER.

kristian_kolb
Ultra Champion

I've been there as well, and while it looks like your LINE_BREAKER regex is correct, I think I remember that being a bit more explicit solved the issue:

LINE_BREAKER = ([\r\n]+)[A-Z]+\s+\|\s+\d+

Also, your TIME_PREFIX is just wrong, it should be:

TIME_PREFIX = ^[A-Z]+\s+\|\s+

Hope this helps,

Kristian

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...