Getting Data In

props.conf not breaking on data correctly

MasterOogway
Communicator

I apologize for the Splunk formatting of my configuration settings....for some reason this UI is removing my "_" from all my configuration lines....even when typing manually versus copy in from a *nix window. So please be aware these are not typos but what happened when I copied the lines in.

+++++++++++++++++++++++++++++++++++++++++

I have been working on many solutions to this line break issue and will turn to the Answer Gawds for help.
The data I have looks like the following:

25 Apr 2012 14:38:39.430 [INFO] [WorkerThread45] [com.blah.accessor.LeadIntakeManagerAccessor] - RealTimeCall Bean message received is: <LiveCall>

blah
blah
whole bunch of lines, but ending with the following LiveCall line
blah
blah
</LiveCall>

I have tried iterations of the following to get the data to break after the last line in each event....."</LiveCall>" but before the date which was listed first.

+++++++++++++++++++++++++++++++++++++++++

Example 1 - (output from btool listed at very bottom)

props.conf

[source:///flocal/logs/netkernel/intakeform/emaillead.log]
MAX_TIMESTAMP_LOOKAHEAD=24
LINE_BREAKER = "<\/LiveCall>"

+++++++++++++++++++++++++++++++++++++++++

Example 2

[source:///flocal/logs/netkernel/intakeform/emaillead.log]
MAX_TIMESTAMP_LOOKAHEAD=24
MUST_BREAK_AFTER = "<\/LiveCall>"

+++++++++++++++++++++++++++++++++++++++++

Example 3

[source:///flocal/logs/netkernel/intakeform/emaillead.log]
MAX_TIMESTAMP_LOOKAHEAD=24

++++++++++++++++++++++++++++++++++++++

Output from '''splunk cmd btool props list --debug'''

 system     [source:///flocal/logs/netkernel/intakeform/emaillead.log]
 system     ANNOTATE_PUNCT = True
 system     BREAK_ONLY_BEFORE = 
 system     BREAK_ONLY_BEFORE_DATE = True
 system     CHARSET = UTF-8
 system     DATETIME_CONFIG = /etc/datetime.xml
 system     HEADER_MODE = 
 system     LEARN_SOURCETYPE = true
 system     LINE_BREAKER = <\/LiveCall>
 system     LINE_BREAKER_LOOKBEHIND = 100
 system     MAX_DAYS_AGO = 2000
 system     MAX_DAYS_HENCE = 2
 system     MAX_DIFF_SECS_AGO = 3600
 system     MAX_DIFF_SECS_HENCE = 604800
 system     MAX_EVENTS = 256
 system     MAX_TIMESTAMP_LOOKAHEAD = 24
 system     MUST_BREAK_AFTER = 
 system     MUST_NOT_BREAK_AFTER = 
 system     MUST_NOT_BREAK_BEFORE = 
 system     SEGMENTATION = indexing
 system     SEGMENTATION-all = full
 system     SEGMENTATION-inner = inner
 system     SEGMENTATION-outer = outer
 system     SEGMENTATION-raw = none
 system     SEGMENTATION-standard = standard
 system     SHOULD_LINEMERGE = True
 system     TRANSFORMS = 
 system     TRUNCATE = 10000
 system     maxDist = 100

+++++++++++++++++++++++++++++++++++++++++
So after trying many, many "sure things" Splunk has decided not to play nicely.

Any thoughts on what I might have missed?

Tags (1)
0 Karma

kristian_kolb
Ultra Champion

Did you try?

SHOULD_LINEMERGE = false
LINE_BREAKER = </LiveCall>([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 24
TIME_FORMAT = %d %b %Y %H:%M:%S.%3N

UPDATE:

No you don't need to linemerge. Or. Well. Maybe you do. I don't have access to your logs.

But the point of LINE_BREAKER is that Splunk does not see 'events' or 'lines' at this stage in the parsing. It's just a stream of data flowing through the parser. When the regex matches, Splunk says "Stop! Break Time!" and creates a new event (removing the captured group, i.e. the newline just after </LiveCall> in the process).

Then data starts flowing again until the regex matches the next time. Thus can the resulting events be either single line or multiline.

/kristian

0 Karma

MasterOogway
Communicator

Yeah...that seemed like a conflict in configs, but at this point I was trying anything.

0 Karma

kristian_kolb
Ultra Champion

You're welcome. LINE_BREAKER won't work without SHOULD_LINEMERGE = false. Should have stated that more clearly, perhaps.

/k

0 Karma

MasterOogway
Communicator

Added Kristian's last line and reran....without success
SHOULD_LINEMERGE = false

Kristian...thanks for the ideas.

0 Karma

MasterOogway
Communicator

Here is the last props.conf I added....without success.

LINE_BREAKER = ([\r\n]+)
MAX_TIMESTAMP_LOOKAHEAD = 24
TIME_FORMAT = %d %b %Y %H:%M:%S.%3N

If anyone has anymore ideas I will surely run it through some data. But if I don't hear anything then I will open a ticket with Splunk and report their findings if they can find the root cause.

0 Karma

MasterOogway
Communicator

That is exactly how I would describe LINE_BREAKER too. But the reason I had to post was because Splunk is NOT breaking in this manner, and around the last event .
I will be trying your LINE_BREAKER & TIME_FORMAT example and repost the results.

0 Karma

kristian_kolb
Ultra Champion

see update above. /k

0 Karma

MasterOogway
Communicator

Kristian.....I like your LINE_BREAKER line but I need to have Linemerge on my data. I will try bits of yours and get back.

0 Karma

sowings
Splunk Employee
Splunk Employee

I'd probably do LINE_BREAKER = ([\r\n]+)(?=\d{1,2}\s+\w{3}\s+\d{4}\s+\d{2}\:\d{2}\:\d{2}\.\d{3})

That is: "Break on a newline which occurs before a likely timestamp."

0 Karma

MasterOogway
Communicator

sowings....Just tried this LINE_BREAKER and still no luck. Will try Kristian next.

0 Karma

sowings
Splunk Employee
Splunk Employee

You can add a \ before an underscore to get a literal underscore (otherwise, it's treated like a markup indicator).

0 Karma

Ayn
Legend

You need to have a matching group in the regex defined for the LINE_BREAKER. This matching group is what Splunk will consider to be the "newline" (and therefore break on + remove).

But, I'm not sure why you would want to set a custom LINE_BREAKER for these logs if they're already separated by newlines and it seems totally possible to handle this with line merging settings instead. I find that it's almost always much easier to deal with line merging settings than with line breaking. If the event really ends at "</LiveCall>" and is followed by the start of a new event marked by a timestamp, I don't see why Splunk wouldn't break the events correctly right from the start. If it doesn't though, setting SHOULD_LINEMERGE=true and MUST_BREAK_AFTER=</LiveCall> should do the trick.

0 Karma

Ayn
Legend

I doubt it's "random". Most likely it's breaking just as it's configured to do, just that you/we don't know which configuration rule is causing this behaviour yet. Find when and why it's breaking and you might well solve the whole issue. How long are these events for instance, maybe you're running into the MAX_EVENTS limit (default is 256)? Can you paste samples here or on pastebin?

MasterOogway
Communicator

Randomly.....sometimes on a date; sometimes in the middle of the multiline event. Just random.

0 Karma

Ayn
Legend

How is it breaking with default settings?

0 Karma

MasterOogway
Communicator

Your thoughts are dead on.....the only issue is I have already tried that setup. Seems the most logical. The btool output listed above only shows from my first example. But I can assure you I have tried MUST_BREAK_AFTER= coupled with SHOULD_LINEMERGE=true for this source. Any other thoughts?
I must be missing something because usually Splunk is VERY good at linebreaking based on Dates, especially when told it is the first 24 characters of any event.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...