On remote host I have a log file which contains mixed one line and multiline events in the following format
ERROR [timestamp1] [id1] [message1_1]
ERROR [timestamp1] [id1] [message1_2]
ERROR [timestamp1] [id1] [message1_3]
ERROR [timestamp2] [id2] [message2_1]
ERROR [timestamp3] [id3] [message3_1]
So, does splunk automagically treat multi lines with the same timestamp as one event block or should I use heavy forwarder to extract all events based on timestamp and id ?
By default Splunk will break events by timestamp, allowing for multiline events. Something like this
ERROR [timestamp1] [id1] long
message
here
ERROR [timestamp2][id2] short message here
will result in two events. In your example, you have five timestamps where three happen to be identical. You will get five events, one for each timestamp. This makes sense as the default configuration because different log events may happen at the same time but might be entirely unrelated to each other.
In order to persuade Splunk to smush multiple timestamped lines into one event you would need an event-breaking pattern somewhere in the log to tell Splunk "every time you read this pattern you should start a new event". That would have to apply to single-line events in that source type as well.
By default Splunk will break events by timestamp, allowing for multiline events. Something like this
ERROR [timestamp1] [id1] long
message
here
ERROR [timestamp2][id2] short message here
will result in two events. In your example, you have five timestamps where three happen to be identical. You will get five events, one for each timestamp. This makes sense as the default configuration because different log events may happen at the same time but might be entirely unrelated to each other.
In order to persuade Splunk to smush multiple timestamped lines into one event you would need an event-breaking pattern somewhere in the log to tell Splunk "every time you read this pattern you should start a new event". That would have to apply to single-line events in that source type as well.
Thanks for clarification
In order to achieve that you would need to strip out the duplicate timestamps and IDs for all subsequent lines prior to indexing. That's possible with a scripted input, you just need to implement the processing yourself. Splunk can do regex-based transformations before indexing, but recognizing the equality of your timestamps and IDs goes beyond the expressive power of regular expressions.
An easier way would be to fix the logging to avoid the duplicate prefixes.
but this is a simple regular expression when you look for a static text. What I would like to achieve is splunk to load text block from the log file:
ERROR [timestamp] [id1] line1
ERROR [timestamp] [id1] line2
and change it to event block with values
_time = timestamp,
id=[id1]
_raw=line1\nline2
By "pattern" I mean "regular expression". For example, if you start events every time the regular expression "f.o.o" matches, you need to specify this in props.conf for the source type:
BREAK_ONLY_BEFORE=f.o.o
Yes, Splunk can divide multiline messages into logs, however in this case there is no simple pattern like text, so my question: can splunk group events based on regex or should put simple line dividing each message in logs