Solved: Why is Splunk Truncating Multi-line Events?

sjnorman · ‎07-16-2014

I'm indexing some Java application log files that use the log4j framework to output log messages. The log files are intermixed with CXF logging interceptor statements that log inbound/outbound SOAP messages that have the following format:

2014-07-16 10:25:13,812 INFO  WebContainer : 16 - Inbound Message
---------------------------- 
ID: 15231
Response-Code: 200
Encoding: UTF-8
Content-Type: text/xml;charset=UTF-8
Headers: {Content-Length=[5612], content-type=[text/xml;charset=UTF-8], Date=[Wed, 16 Jul 2014 15:25:13 GMT], Server=[Jetty(7.1.6.v20100715)]}
Payload: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Header></soap:Header><soap:Body><MyXmlMessage></MyXmlMessage></soap:Body></soap:Envelope>
----------------------------

I'd like to log these statements as single, multi-line events but Splunk seems to be randomly truncating the events after the following line: "Content-Type: text/xml;charset=UTF-8"

i.e. some events include the full context (including the payload), whereas others only include up to the content-type.

Here's what my props.conf looks like:

[default]
CHARSET = UTF-8
LINE_BREAKER_LOOKBEHIND = 100
TRUNCATE = 0
DATETIME_CONFIG = /etc/datetime.xml
ANNOTATE_PUNCT = True
HEADER_MODE =
MAX_DAYS_HENCE=2
MAX_DAYS_AGO=2000
MAX_DIFF_SECS_AGO=3600
MAX_DIFF_SECS_HENCE=604800
MAX_TIMESTAMP_LOOKAHEAD = 128
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE = 
BREAK_ONLY_BEFORE_DATE = True
MAX_EVENTS = 20000 
MUST_BREAK_AFTER = 
MUST_NOT_BREAK_AFTER = 
MUST_NOT_BREAK_BEFORE = 


[log4j]
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 30
#BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}
NO_BINARY_CHECK = true
pulldown_type = true 
maxDist = 75

Can anyone explain why Splunk would be truncating the events prematurely?

yannK · ‎07-16-2014

The props seems correct, especially the BREAK_ONLY_BEFORE.

Try to add BREAK_ONLY_BEFORE_DATE = false
and make sure that the props.conf is deployed on the indexers and heavy forwarders (if any), because they are the instances parsing the events.

View solution in original post

sjnorman · ‎07-17-2014

Here is my props.conf entry for log4j:

[log4j] 
TIME_FORMAT = %Y-%m-%d %H:%M:%S 
TIME_PREFIX = ^ 
MAX_TIMESTAMP_LOOKAHEAD = 25
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3} 
NO_BINARY_CHECK = 1 
pulldown_type = 
true maxDist = 75

sjnorman · ‎07-17-2014

FYI, I've checked the log files manually and there are no special characters that would be tripping up Splunk -- all lines end with a line feed character.

chanfoli · ‎07-16-2014

I think the timestamp in the payload line in combination with some of your other options is tripping it up.

I made a small sample file and got proper breaking with something as simple as this for the sourcetype:

# chanfoli's settings
MAX_TIMESTAMP_LOOKAHEAD=25
NO_BINARY_CHECK=1

sjnorman · ‎07-17-2014

FYI, I manually checked the log files and the lines all end with line feeds...

sjnorman · ‎07-17-2014

I applied the changes and still suffer from the same problem...somewhere between 25% and 50% of the events for the CXF log statements are being cut off after the "Content-Type: text/xml; charset=UTF-8" line. I really don't know what's tripping it up there.

yannK · ‎07-16-2014

The props seems correct, especially the BREAK_ONLY_BEFORE.

Try to add BREAK_ONLY_BEFORE_DATE = false
and make sure that the props.conf is deployed on the indexers and heavy forwarders (if any), because they are the instances parsing the events.

sjnorman · ‎07-17-2014

Doh! I was applying the configuration to the forwarders. I applied the update to the indexer and it seems to be working now, thanks!

yannK · ‎07-16-2014

The parsing is no happening at the universal/lightweight forwarder level, so it should not make a difference.

sjnorman · ‎07-16-2014

Thanks for the suggestion, but it didn't seem to have any effect -- the behaviour is still the same.

FYI, yes I've made the changes to props.conf on my universal forwarders and re-started them afterwards.

Why is Splunk Truncating Multi-line Events?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!