Solved: Mixed one-line and multiline events

jakubincloud · ‎11-12-2012

On remote host I have a log file which contains mixed one line and multiline events in the following format

   ERROR [timestamp1] [id1] [message1_1]
   ERROR [timestamp1] [id1] [message1_2]
   ERROR [timestamp1] [id1] [message1_3]

   ERROR [timestamp2] [id2] [message2_1]
   ERROR [timestamp3] [id3] [message3_1]

So, does splunk automagically treat multi lines with the same timestamp as one event block or should I use heavy forwarder to extract all events based on timestamp and id ?

martin_mueller · ‎11-12-2012

By default Splunk will break events by timestamp, allowing for multiline events. Something like this

ERROR [timestamp1] [id1] long
message
here
ERROR [timestamp2][id2] short message here

will result in two events. In your example, you have five timestamps where three happen to be identical. You will get five events, one for each timestamp. This makes sense as the default configuration because different log events may happen at the same time but might be entirely unrelated to each other.

In order to persuade Splunk to smush multiple timestamped lines into one event you would need an event-breaking pattern somewhere in the log to tell Splunk "every time you read this pattern you should start a new event". That would have to apply to single-line events in that source type as well.

View solution in original post

martin_mueller · ‎11-12-2012

By default Splunk will break events by timestamp, allowing for multiline events. Something like this

ERROR [timestamp1] [id1] long
message
here
ERROR [timestamp2][id2] short message here

will result in two events. In your example, you have five timestamps where three happen to be identical. You will get five events, one for each timestamp. This makes sense as the default configuration because different log events may happen at the same time but might be entirely unrelated to each other.

In order to persuade Splunk to smush multiple timestamped lines into one event you would need an event-breaking pattern somewhere in the log to tell Splunk "every time you read this pattern you should start a new event". That would have to apply to single-line events in that source type as well.

jakubincloud · ‎11-13-2012

Thanks for clarification

martin_mueller · ‎11-13-2012

In order to achieve that you would need to strip out the duplicate timestamps and IDs for all subsequent lines prior to indexing. That's possible with a scripted input, you just need to implement the processing yourself. Splunk can do regex-based transformations before indexing, but recognizing the equality of your timestamps and IDs goes beyond the expressive power of regular expressions.

An easier way would be to fix the logging to avoid the duplicate prefixes.

jakubincloud · ‎11-13-2012

but this is a simple regular expression when you look for a static text. What I would like to achieve is splunk to load text block from the log file:
ERROR [timestamp] [id1] line1
ERROR [timestamp] [id1] line2
and change it to event block with values
_time = timestamp,
id=[id1]
_raw=line1\nline2

martin_mueller · ‎11-12-2012

By "pattern" I mean "regular expression". For example, if you start events every time the regular expression "f.o.o" matches, you need to specify this in props.conf for the source type:

BREAK_ONLY_BEFORE=f.o.o

jakubincloud · ‎11-12-2012

Yes, Splunk can divide multiline messages into logs, however in this case there is no simple pattern like text, so my question: can splunk group events based on regex or should put simple line dividing each message in logs

Mixed one-line and multiline events

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!