I'm indexing some Java application log files that use the log4j framework to output log messages. The log files are intermixed with CXF logging interceptor statements that log inbound/outbound SOAP messages that have the following format:
2014-07-16 10:25:13,812 INFO WebContainer : 16 - Inbound Message
----------------------------
ID: 15231
Response-Code: 200
Encoding: UTF-8
Content-Type: text/xml;charset=UTF-8
Headers: {Content-Length=[5612], content-type=[text/xml;charset=UTF-8], Date=[Wed, 16 Jul 2014 15:25:13 GMT], Server=[Jetty(7.1.6.v20100715)]}
Payload: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Header></soap:Header><soap:Body><MyXmlMessage></MyXmlMessage></soap:Body></soap:Envelope>
----------------------------
I'd like to log these statements as single, multi-line events but Splunk seems to be randomly truncating the events after the following line: "Content-Type: text/xml;charset=UTF-8"
i.e. some events include the full context (including the payload), whereas others only include up to the content-type.
Here's what my props.conf looks like:
[default]
CHARSET = UTF-8
LINE_BREAKER_LOOKBEHIND = 100
TRUNCATE = 0
DATETIME_CONFIG = /etc/datetime.xml
ANNOTATE_PUNCT = True
HEADER_MODE =
MAX_DAYS_HENCE=2
MAX_DAYS_AGO=2000
MAX_DIFF_SECS_AGO=3600
MAX_DIFF_SECS_HENCE=604800
MAX_TIMESTAMP_LOOKAHEAD = 128
SHOULD_LINEMERGE = True
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = True
MAX_EVENTS = 20000
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
[log4j]
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 30
#BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}
NO_BINARY_CHECK = true
pulldown_type = true
maxDist = 75
Can anyone explain why Splunk would be truncating the events prematurely?
The props seems correct, especially the BREAK_ONLY_BEFORE.
Here is my props.conf entry for log4j:
[log4j]
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}
NO_BINARY_CHECK = 1
pulldown_type =
true maxDist = 75
FYI, I've checked the log files manually and there are no special characters that would be tripping up Splunk -- all lines end with a line feed character.
I think the timestamp in the payload line in combination with some of your other options is tripping it up.
I made a small sample file and got proper breaking with something as simple as this for the sourcetype:
# chanfoli's settings
MAX_TIMESTAMP_LOOKAHEAD=25
NO_BINARY_CHECK=1
FYI, I manually checked the log files and the lines all end with line feeds...
I applied the changes and still suffer from the same problem...somewhere between 25% and 50% of the events for the CXF log statements are being cut off after the "Content-Type: text/xml; charset=UTF-8" line. I really don't know what's tripping it up there.
The props seems correct, especially the BREAK_ONLY_BEFORE.
Doh! I was applying the configuration to the forwarders. I applied the update to the indexer and it seems to be working now, thanks!
The parsing is no happening at the universal/lightweight forwarder level, so it should not make a difference.
Thanks for the suggestion, but it didn't seem to have any effect -- the behaviour is still the same.
FYI, yes I've made the changes to props.conf on my universal forwarders and re-started them afterwards.