Getting Data In

Log file with differing message formats

mikelanghorst
Motivator

I've run across an odd log file from EMC's Data Protection application that is logging two very different log formats into a single file. Example:

2012-03-08 12:06:30,643 INFO Webapp Launcher [Init] Connection to controller at fdpap01.oa.domain.com:3916
2012-03-08 12:06:30,643 INFO Webapp Launcher [Init] Connection to reporter at fdpap01.oa.domain.com:4002
INFO 2560.2564 20120308:123239 service - ServerCtrlHandler(): Service stop signalled - exiting
INFO 2676.2696 20120308:123532 webapp - daemonMain(): Setting memory limit '-Xmx128m'
INFO 2676.2696 20120308:123535 webapp - daemonMain(): DPA Webapp
INFO 2676.2696 20120308:123535 webapp - daemonMain(): (c) 1994-2009 EMC Corporation. All rights reserved.
INFO 2676.2696 20120308:123535 webapp - daemonMain(): Version: 5.0.1 build 4792 on windows
INFO 2676.2696 20120308:123535 webapp - daemonMain(): Logging at level Info
2012-03-08 12:36:01,967 INFO Webapp Launcher [Init] Connection to controller at fdpap01.oa.domain.com:3916
2012-03-08 12:36:01,967 INFO Webapp Launcher [Init] Connection to reporter at fdpap01.oa.domain.com:4002
INFO 2676.2680 20120308:133056 service - ServerCtrlHandler(): Service stop signalled - exiting
INFO 3912.3884 20120308:133135 webapp - daemonMain(): Setting memory limit '-Xmx128m'
INFO 3912.3884 20120308:133135 webapp - daemonMain(): DPA Webapp
INFO 3912.3884 20120308:133135 webapp - daemonMain(): (c) 1994-2009 EMC Corporation. All rights reserved.
INFO 3912.3884 20120308:133135 webapp - daemonMain(): Version: 5.0.1 build 4792 on windows
INFO 3912.3884 20120308:133135 webapp - daemonMain(): Logging at level Info
2012-03-08 13:31:38,752 INFO Webapp Launcher [Init] Connection to controller at fdpap01.oa.domain.com:3916
2012-03-08 13:31:38,752 INFO Webapp Launcher [Init] Connection to reporter at fdpap01.oa.domain.com:4002

Whenever I've had to assist splunk with line breaking & date extraction, it's been a consistent format for the entire file. Either specified a source or sourcetype, and the specifics to break on. Unsure how to handle this one in regards to date extraction. For the lines starting with the severity, the third column is the datestamp, and does line up that each of these should be a different event. Currently by default Splunk is merging these.

Ideas?

Tags (1)

hexx
Splunk Employee
Splunk Employee

If you can be sure that you'll always have a 1 line = 1 event parity for this data source, the simple way to fix the line-breaking is simply to set :

SHOULD_LINEMERGE = false

The different time formats might cause a different kind of problem, as Splunk's time stamp extraction heuristic are not fond of this situation.

Still, it might be worth it to see how the time stamp extraction behaves once you've fixed the line-breaking. Perhaps you should still add, at a minimum :

MAX_TIMESTAMP_LOOKAHEAD = 37

...in order to scope the time stamp extraction as much as we currently can.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...