Having an issue where some of the Bro SMTP log entries are being combined in Splunk to form one event as opposed to properly breaking and generating multiple Splunk events. Logs are being forwarded from a Linux machine using the universal forwarder to a Windows based indexer.
Manually viewing the text log file shows each line as an individual entry. None of the events in the text log file are more then a single line.
I've only seen this occurring in the SMTP log and it does not happen 100% of the time.
No changes have been made to any of the default config file, other then adding my inputs.
The default handling of log files by Splunk is to break on timestamp, so usually the symptom of many entries being combined is really a matter of Splunk not recognizing the per-item timestamp (or there not being one).
Does this data have a pre-configured sourcetype, or do you need to create one? TIME_FORMAT TIME_PREFIX and SHOULD_LINEMERGE are usually the useful values here.
Of the 8300 Bro SMTP events in the last 24 hours 2300 have 2 or more lines. So about 25% of the events are being interpreted by Splunk as more then a single line. It seems to be a pretty big mystery as from everything I've read SHOULD_LINEMERGE set to false should mean one event per line.
How often do you get more than two bro events on one Splunk event? I ran a search on my data for the last 24 hours and it only happened once. I am looking for events where linecount>=2. I still do not understand why that one event that I had did not break properly.
I guess my other question would be that MAX_TIMESTAMP_LOOKAHEAD is set to 20. However the timestamp is always the first 17 characters. If it isn't that, there has to be some sort of bug in play.
The universal forwarder sends the log as the bro sourcetype. There's some magic that happens on the indexer side (that I don't completely understand) that changes the sourcetype from bro to bro_
[bro]
SHOULD_LINEMERGE = false
[(?::){0}bro_*]
SHOULD_LINEMERGE = false
http://answers.splunk.com/answers/139887/splunk-bro-app-and-how-to-separate-data-into-multiple-entri... <-- possibly a similar problem?
FWIW the regex hack to get sourcetype wildcarding is a really bad idea. It's tricking the implementation into thinking there's a :: splitter like source::, and then not providing a known one, which happens to lead to the sourcetype matching. Full behavior may involve bugs, and fixing bugs may break this in the future. I know you didn't make this choice, but I have to put this message out everywhere I can. It's not a documented pattern but clearly it's in use.
SHOULD_LINEMERGE = false should be sufficient to make each line its own event. Most likely the sourcetype is not being applied somehow, or the settings are not loaded somehow.
Start by verifying the exact sourcetype that the data receives. If it matches the predefined sourcetype then we have a mystery.
The app comes with predefined sourcetypes, so no need to create those. SHOULD_LINEMERGE is set to false and TIME_FORMAT is configured as TIME_FORMAT = %s.%6N No setting for TIME_PREFIX. So possibly something about the SMTP log's timestamps could be throwing it off?