I apologize for the Splunk formatting of my configuration settings....for some reason this UI is removing my "_" from all my configuration lines....even when typing manually versus copy in from a *nix window. So please be aware these are not typos but what happened when I copied the lines in.
+++++++++++++++++++++++++++++++++++++++++
I have been working on many solutions to this line break issue and will turn to the Answer Gawds for help.
The data I have looks like the following:
25 Apr 2012 14:38:39.430 [INFO] [WorkerThread45] [com.blah.accessor.LeadIntakeManagerAccessor] - RealTimeCall Bean message received is: <LiveCall>
blah
blah
whole bunch of lines, but ending with the following LiveCall line
blah
blah
</LiveCall>
I have tried iterations of the following to get the data to break after the last line in each event....."</LiveCall>
" but before the date which was listed first.
+++++++++++++++++++++++++++++++++++++++++
Example 1 - (output from btool listed at very bottom)
props.conf
[source:///flocal/logs/netkernel/intakeform/emaillead.log]
MAX_TIMESTAMP_LOOKAHEAD=24
LINE_BREAKER = "<\/LiveCall>"
+++++++++++++++++++++++++++++++++++++++++
Example 2
[source:///flocal/logs/netkernel/intakeform/emaillead.log]
MAX_TIMESTAMP_LOOKAHEAD=24
MUST_BREAK_AFTER = "<\/LiveCall>"
+++++++++++++++++++++++++++++++++++++++++
Example 3
[source:///flocal/logs/netkernel/intakeform/emaillead.log]
MAX_TIMESTAMP_LOOKAHEAD=24
++++++++++++++++++++++++++++++++++++++
Output from '''splunk cmd btool props list --debug'''
system [source:///flocal/logs/netkernel/intakeform/emaillead.log]
system ANNOTATE_PUNCT = True
system BREAK_ONLY_BEFORE =
system BREAK_ONLY_BEFORE_DATE = True
system CHARSET = UTF-8
system DATETIME_CONFIG = /etc/datetime.xml
system HEADER_MODE =
system LEARN_SOURCETYPE = true
system LINE_BREAKER = <\/LiveCall>
system LINE_BREAKER_LOOKBEHIND = 100
system MAX_DAYS_AGO = 2000
system MAX_DAYS_HENCE = 2
system MAX_DIFF_SECS_AGO = 3600
system MAX_DIFF_SECS_HENCE = 604800
system MAX_EVENTS = 256
system MAX_TIMESTAMP_LOOKAHEAD = 24
system MUST_BREAK_AFTER =
system MUST_NOT_BREAK_AFTER =
system MUST_NOT_BREAK_BEFORE =
system SEGMENTATION = indexing
system SEGMENTATION-all = full
system SEGMENTATION-inner = inner
system SEGMENTATION-outer = outer
system SEGMENTATION-raw = none
system SEGMENTATION-standard = standard
system SHOULD_LINEMERGE = True
system TRANSFORMS =
system TRUNCATE = 10000
system maxDist = 100
+++++++++++++++++++++++++++++++++++++++++
So after trying many, many "sure things" Splunk has decided not to play nicely.
Any thoughts on what I might have missed?
Did you try?
SHOULD_LINEMERGE = false
LINE_BREAKER = </LiveCall>([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 24
TIME_FORMAT = %d %b %Y %H:%M:%S.%3N
UPDATE:
No you don't need to linemerge. Or. Well. Maybe you do. I don't have access to your logs.
But the point of LINE_BREAKER is that Splunk does not see 'events' or 'lines' at this stage in the parsing. It's just a stream of data flowing through the parser. When the regex matches, Splunk says "Stop! Break Time!" and creates a new event (removing the captured group, i.e. the newline just after </LiveCall>
in the process).
Then data starts flowing again until the regex matches the next time. Thus can the resulting events be either single line or multiline.
/kristian
Yeah...that seemed like a conflict in configs, but at this point I was trying anything.
You're welcome. LINE_BREAKER won't work without SHOULD_LINEMERGE = false. Should have stated that more clearly, perhaps.
/k
Added Kristian's last line and reran....without success
SHOULD_LINEMERGE = false
Kristian...thanks for the ideas.
Here is the last props.conf I added....without success.
LINE_BREAKER = ([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 24
TIME_FORMAT = %d %b %Y %H:%M:%S.%3N
If anyone has anymore ideas I will surely run it through some data. But if I don't hear anything then I will open a ticket with Splunk and report their findings if they can find the root cause.
That is exactly how I would describe LINE_BREAKER too. But the reason I had to post was because Splunk is NOT breaking in this manner, and around the last event .
I will be trying your LINE_BREAKER & TIME_FORMAT example and repost the results.
see update above. /k
Kristian.....I like your LINE_BREAKER line but I need to have Linemerge on my data. I will try bits of yours and get back.
I'd probably do LINE_BREAKER = ([\r\n]+)(?=\d{1,2}\s+\w{3}\s+\d{4}\s+\d{2}\:\d{2}\:\d{2}\.\d{3})
That is: "Break on a newline which occurs before a likely timestamp."
sowings....Just tried this LINE_BREAKER and still no luck. Will try Kristian next.
You can add a \ before an underscore to get a literal underscore (otherwise, it's treated like a markup indicator).
You need to have a matching group in the regex defined for the LINE_BREAKER
. This matching group is what Splunk will consider to be the "newline" (and therefore break on + remove).
But, I'm not sure why you would want to set a custom LINE_BREAKER
for these logs if they're already separated by newlines and it seems totally possible to handle this with line merging settings instead. I find that it's almost always much easier to deal with line merging settings than with line breaking. If the event really ends at "</LiveCall>
" and is followed by the start of a new event marked by a timestamp, I don't see why Splunk wouldn't break the events correctly right from the start. If it doesn't though, setting SHOULD_LINEMERGE=true
and MUST_BREAK_AFTER=</LiveCall>
should do the trick.
I doubt it's "random". Most likely it's breaking just as it's configured to do, just that you/we don't know which configuration rule is causing this behaviour yet. Find when and why it's breaking and you might well solve the whole issue. How long are these events for instance, maybe you're running into the MAX_EVENTS
limit (default is 256)? Can you paste samples here or on pastebin?
Randomly.....sometimes on a date; sometimes in the middle of the multiline event. Just random.
How is it breaking with default settings?
Your thoughts are dead on.....the only issue is I have already tried that setup. Seems the most logical. The btool output listed above only shows from my first example. But I can assure you I have tried MUST_BREAK_AFTER= coupled with SHOULD_LINEMERGE=true for this source. Any other thoughts?
I must be missing something because usually Splunk is VERY good at linebreaking based on Dates, especially when told it is the first 24 characters of any event.