Getting Data In

LineBreakingProcessor truncates long events and drops the next (related)

rkilen
Explorer

Running Splunk Enterprise 6.5.6.

I am parsing incoming events of sourcetype weblogic_stdout, and am having some trouble with the LineBreakingProcessor when Truncation on large events. In my case the events that are problematic come in pairs having nearly the same timestamp, with the first containing a Java stack trace, and the second containing a unique error identifier along with a different Java stack trace. Given the size of the stack trace, the first event is truncated, but LineBreakingProcessor doesn't seem to see the second event at all. The event after this one, however, is indexed by Splunk.

The first thought I had was that the second event got truncated out with the first, but the "line length >=" corresponded to the number of characters in the first event, so it seems the second event is completely ignored/dropped.

I searched the splunkd.log to find the longest line length, and tried setting TRUNCATE to something a little higher than that, but when I restarted the Search Peer, I found that the largest line length had approximately doubled, which I found very puzzling.

It's also possible that my LINE_BREAKER regex isn't working as designed, but I've been through it several times (and now have some additional eyes looking at it. The props.conf contains the following active lines:

[weblogic_stdout]
DATETIME_CONFIG = /etc/apps/spste/weblogic_stdout.xml

===> The following works to extract the dates while leaving the text in the event

LINE_BREAKER = ([\r\n]+)([?\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\,\d{3}]?\s|\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s|#{0,4}<\w{3}\s\d{1,2}\,\s\d{4}\s\d{1,2}:\d{2}:\d{2}\s[AP]M\s\w{3,}>\s|\w{3}\s\d{1,2}\,\s\d{4}\s\d{1,2}:\d{2}:\d{2}\s[AP]M\s|\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[DEBUG]\s\d{8}\s\d{2}:\d{2}:\d{2}\,\d{3}\s|[INFO]\s\d{8}\s\d{2}:\d{2}:\d{2}\,\d{3}\s|[ERROR]\s\d{8}\s\d{2}:\d{2}:\d{2}\,\d{3}\s|[WARN]\s\d{8}\s\d{2}:\d{2}:\d{2}\,\d{3}\s)
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = true

TRUNCATE = 80000

The regex alternate pattern that should match is the one with [ERROR] in it. An example start line is:
[ERROR] 20171205 09:58:35.277 [ other stuff...]

The weblogic_stdout.xml file defines the matching pattern, but shouldn't come into play, since the XML file is only for date extraction, right?

0 Karma

rkilen
Explorer

My error was in the LINE_BREAKER. I had inadvertently escaped some [ characters with \ when I really wanted them to serve as alternate character specifications. So the bad version was:
LINE_BREAKER = ([\r\n]+)([?\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\,\d{3}]?\s|\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s|#{0,4}<\w{3}\s\d{1,2}\,\s\d{4}\s\d{1,2}:\d{2}:\d{2}\s[AP]M\s\w{3,}>\s|\w{3}\s\d{1,2}\,\s\d{4}\s\d{1,2}:\d{2}:\d{2}\s[AP]M\s|\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[DEBUG]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[INFO]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[ERROR]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[WARN]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s)

The corrected version is:
LINE_BREAKER = ([\r\n]+)([?\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\,\d{3}]?\s|\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s|#{0,4}<\w{3}\s\d{1,2}\,\s\d{4}\s\d{1,2}:\d{2}:\d{2}\s[AP]M\s\w{3,}>\s|\w{3}\s\d{1,2}\,\s\d{4}\s\d{1,2}:\d{2}:\d{2}\s[AP]M\s|\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[DEBUG]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[INFO]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[ERROR]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s|[WARN]\s\d{8}\s\d{2}:\d{2}:\d{2}[.,]\d{3}\s)

This regex is necessarily complex due to the variety of date/time formats I found were present in my weblogic_stdout events.

0 Karma

sshelly_splunk
Splunk Employee
Splunk Employee

DATETIME_CONFIG specifies how you want to extract timestamps at ingest time. When using DATETIME_CONFIG pointing to a file, BREAK_ONLY_BEFORE_DATE will not work (most likely) as expected. Can u put up a sample event ? I think specifically stating timestamp format will probably get you where you want to be.

0 Karma

rkilen
Explorer

I discovered that I had an error in my LINE_BREAKER regex, and I believe that may have fixed the event-dropping problem. My users will be keeping an eye out, and I'll post back here in a few days.

0 Karma

micahkemp
Champion

Do you mind posting your solution as an answer, and accepting it if you believe it is correct?

0 Karma

rkilen
Explorer

Once I'm fairly sure, I'll post it as an answer.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...