I have a rolling log file that is being monitored and indexed in Splunk. When it reaches a certain size, the file is archived and a new log file is created.
Issue is encountered in the events indexed where line breaking is encountered. Can anyone advise on the scenario?
system.log -> rolling log file
Old File Content:
Error encountered in Process A
New File Content (same file)
Error encountered in Process E. Please check accordingly.
Process not completed.
Events indexed in Splunk:
Event A: #### 12-07-2017 4:10:00 PM StandardError
Error encountered in Process A
Event B: Error encountered in Process E. Please check accordingly.
Event C: #### 12-07-2017 4:10:00 PM Lookup Error
Process not completed.
Expectation: Events should not break before the timestamp
Event A: #### 12-07-2017 4:10:00 PM StandardError
Error encountered in Process A
Error encountered in Process E. Please check accordingly.
Event B: #### 12-07-2017 4:10:00 PM Lookup Error
Process not completed.
Appreciate responses. Thank you.
Your biggest problem is with the first line of the new file not having a timestamp, so it thinks it should be a new event. This is because the indexing doesn't buffer the results from one file to another. If you use a regular time for rotating the log, it will probably work better than doing it by size. If you must do it by size, then you may have to live with one event being separated at each log rotate. If this is on a linux machine with logrotate
, then you can change the manner of the rotation to time-based and it will work better for you.
The problem here is that your logging source is very poorly behaved; no logger should ever rotate files in the midst of an event. In any case, there is nothing that Splunk can do about it so fixing it upstream is your only option .
Your biggest problem is with the first line of the new file not having a timestamp, so it thinks it should be a new event. This is because the indexing doesn't buffer the results from one file to another. If you use a regular time for rotating the log, it will probably work better than doing it by size. If you must do it by size, then you may have to live with one event being separated at each log rotate. If this is on a linux machine with logrotate
, then you can change the manner of the rotation to time-based and it will work better for you.
Actually, the normal behavior in this case is to assign an event that has no timestamp the value of _time
from the previous event that does have one form this same host/sourcetype, which in this case, would be exactly the correct thing to do.
Thank you for your response, @cpetterborg. From Splunk's end, does your message mainly say that Splunk is unable to address this kind of case in indexing?
Yes. It realizes that it is in a different version of the file, which it thinks that it will be new events from the beginning of the file. It doesn't cache things, hoping to make a match from one file to the next, so it ends the one event and starts the other event from the first line of the file.
So if you use a time-based log rotation, instead of time-based, it will probably keep the lines together in the same file that are at the same time, which will allow you (and the indexing) to combine the lines to a single event.