Getting Data In

Why are my multi-line events getting broken in the middle of a string?

techols
New Member

I have a log that contains multi-line events, some events contain java stack traces. Here is an example log:

INFO  2017-02-06 17:57:36,026 com.loadbalancer.http.SynchHttpProducer [82] - Details of successful transactions: 
2017-02-06 17:57:35, 727 [SECU] AUTH date:2017-02-06PST17:57:35-0800 login:loging status:SUCCESS
2017-02-06 17:57:35, 727 [SECU] AUTH date:2017-02-06PST17:57:35-0800 IP:fqdn login:login status:SUCCESS

INFO  2017-02-06 17:57:39,111 com.targethealthcheck.HeartbeatHealthCheckService [374] - TargetHost for IP fqdn marked as AVAILABLE

INFO  2017-02-06 17:58:05,524 com.monitoring.AbstractPrinter [90] - [STAT][PING] 
[STAT][PING] Checked by Manager : true
[STAT][PING] Checked by Load Balancer : false
INFO  2017-02-06 17:58:05,525 com.alu.e3.monitoring.AbstractPrinter [90] - [STAT][PROV] 
INFO  2017-02-06 17:58:05,526 com.alu.e3.monitoring.AbstractPrinter [90] - [STAT][TOPO] 

WARN  2017-02-06 17:58:05,543 com.monitoring.AbstractPrinter [94] - Error during printing stats for category SYNCHRO
java.lang.NullPointerException
    at com.monitoring.StatMaker.getVariance(StatMaker.java:89)[494:e3-bundle:6.4.4.1]
    at com.monitoring.StatMaker.getSigma(StatMaker.java:101)[494:e3-bundle:6.4.4.1]
    at com..monitoring.StatMaker.toString(StatMaker.java:136)[494:e3-bundle:6.4.4.1]
    at java.lang.String.valueOf(Unknown Source)[:1.7.0_79]
    at java.lang.StringBuilder.append(Unknown Source)[:1.7.0_79]
    at com.monitoring.SpeakerSynchronizationTime.logStatsGateway(SpeakerSynchronizationTime.java:235)[494:e3-bundle:6.4.4.1]

For this I used the following props.conf:

[ngeag_e3]
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TRUNCATE = 0
MAX_EVENTS = 2048
BREAK_ONLY_BEFORE = FATAL[\s]+|ERROR[\s]+|INFO[\s]+|WARN[\s]+|DEBUG[\s]+
DATETIME_CONFIG =
TIME_FORMAT = "%Y-%m-%d %H:%M:%S,%3N"
TIME_PREFIX = FATAL[\s]+|ERROR[\s]+|INFO[\s]+|WARN[\s]+|DEBUG[\s]+
MAX_TIMESTAMP_LOOKAHEAD = 23
category = Custom
description = Log file containing AME events
disabled = false
pulldown_type = true

LINE_BREAKER with SHOULD_LINE_MERGE = false did not work for me because the LINE_BREAKER would be:

LINE_BREAKER = (FATAL[\s]+|ERROR[\s]+|INFO[\s]+|WARN[\s]+|DEBUG[\s]+)

This would remove the log type (FATAL|ERROR|INFO|WARN|DEBUG) and also removes the time prefix such that the timestamps don't work. The above config works mostly but my events are sporadically being split unexpectedly. When I show source, I will see strange modifications to the log where the log line gets split mid word:

INFO  2017-02-06 17:57:36,026 com.loadbalancer.http.SynchHttpProducer [82] - D
etails of successful transactions: 
2017-02-06 17:57:35, 727 [SECU] AUTH date:2017-02-06PST17:57:35-0800 login:loging status:SUCCESS
2017-02-06 17:57:35, 727 [SECU] AUTH date:2017-02-06PST17:57:35-0800 IP:fqdn login:login status:SUCCESS

INFO  2017-02-06 17:57:39,111 com.targethealthcheck.HeartbeatHealthCheckService [374] - TargetHost for IP fqdn marked as AVA
ILABLE

INFO  2017-02-06 17:58:05,524 com.monitoring.AbstractPrinter [90] - [STAT][PING] 
[STAT][PING] Checked by Manager : true
[STAT]
[PING] Checked by Load Balancer : false

Not only is the line getting split, but it is not merging into the prior event. Any ideas why the lines are getting split or why they are not getting merged?

0 Karma

tmarrazzo
Engager

Focusing just on the line breaking-try the following adjustments..

SHOULD_LINEMERGE = false
LINE_BREAKER =([\r\n]+)\w+\s+\d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\,\d{3}

The characters you specify within the line breaker's capture group (when line merge is set to false) will, as you discovered, be deleted from the indexed event. To account for this, place the carriage return/new line within the capture group, and a positive lookahead of something consistently defining the start point of your event. It looks like in your case, a word, followed by a number of spaces, followed by your timestamp.

Setting line merge to false should prove to be more efficient.

I would also advise against setting truncate to 0. If your application suddenly begins spewing garbage data, splunk will not truncate the event and will happily continue it's indexing duties.

gvmorley
Contributor

I'd also suggest going with the SHOULD_LINEMERGE = false and only having the capture group for the [\r\n]+

You could also do a combination of this answer, with your original approach and go for:

SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?:FATAL[\s]+|ERROR[\s]+|INFO[\s]+|WARN[\s]+|DEBUG[\s]+)

(?:) in regex is a non-capturing group, so also handy for this sort of grouping.

0 Karma

techols
New Member

Strangely LINE_BREAKER =([\r\n]+)\w+\s+\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\,\d{3} removes the \w+ part even though it is not in the capture group.

LINE_BREAKER = ([\r\n]+)(?:FATAL[\s]+|ERROR[\s]+|INFO[\s]+|WARN[\s]+|DEBUG[\s]+) removes the first letter of the word so I get ATAL, RROR, NFO, ARN, or EBUG.

I can't figure out why it is removing characters that are not in the capture group. I tried to do a named capture group to save the log type string to a variable that I could search for but that didn't seem to work.

0 Karma

gvmorley
Contributor

Very strange.

This was the props.conf which I created to test with:

[not-breaking]
TRUNCATE = 0
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?:FATAL[\s]+|ERROR[\s]+|INFO[\s]+|WARN[\s]+|DEBUG[\s]+)
MAX_EVENTS = 100000
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N

Are you using something similar?

0 Karma

hsesterhenn
Path Finder

Hi,

why don't you use the time stamp as the event break?

https://docs.splunk.com/Documentation/Splunk/6.5.2/Data/Configureeventlinebreaking

BREAK_ONLY_BEFORE_DATE

In your case this should work pretty well because ALL lines are starting with the four letter status and the timestamp... no matter if there is a Java stack trace or anything else...

HTH,

Holger

0 Karma

techols
New Member

Strangely when I used BREAK_ONLY_BEFORE_DATE = true, it does not recognize the time prefix and breaks at the wrong dates and doesn't break at the time_prefix

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...