Are there any best practices or recommendations when dealing with very large log files?
I have a 50 GB log file that takes 2-3 hours to rotate. While this happens, a lot of events get duplicated. The rotation script in place right now does a cp and then truncate, but I see around 1,500 events on the forwarder saying:
Will begin reading at offset=0
throughout the rotation period.
Any thoughts?
How are you configured at the moment? Universal forwarder sending straight to an indexer?
You could consider excluding the data through the props.conf configuration file to exclude data that is not of value before it is indexed.
Here's a question/answer that might be relevant: https://answers.splunk.com/answers/44865/remove-out-section-of-log.html
Here's a cisco specific howto on excluding events in networking logs that had little value, it too could be relevant to what you're looking to do:
http://networkerslog.blogspot.com/2012/01/how-to-filter-unwanted-data-without.html
Hope that helps!
Yes, we're using an universal forwarder and doing the filtering on the indexer. Only 5% of the logs in the logfile are relevant.
The problem is not the filtering, though. When you rotate a 50GB file weird things start to happen as contention becomes and issue. The issue at the heart of this is that having a single file accumulate 50GB of data in a single day is a bad practice. I found a few posts stating that the upper limit of a log file is a few GB before a forwarder starts to have issues.
I got together with the application developer and the person that wrote the log rotation script and they're going to have to redesign the logging to add more structure (have separate files for unrelated events) and change the way the file is being rotated.
Thanks!