Are there best practices when indexing very large ...

romedome · ‎02-05-2016

Are there any best practices or recommendations when dealing with very large log files?

I have a 50 GB log file that takes 2-3 hours to rotate. While this happens, a lot of events get duplicated. The rotation script in place right now does a cp and then truncate, but I see around 1,500 events on the forwarder saying:

Will begin reading at offset=0

throughout the rotation period.

Any thoughts?

pgreer_splunk · ‎02-05-2016

How are you configured at the moment? Universal forwarder sending straight to an indexer?

You could consider excluding the data through the props.conf configuration file to exclude data that is not of value before it is indexed.

Here's a question/answer that might be relevant: https://answers.splunk.com/answers/44865/remove-out-section-of-log.html

Here's a cisco specific howto on excluding events in networking logs that had little value, it too could be relevant to what you're looking to do:

http://networkerslog.blogspot.com/2012/01/how-to-filter-unwanted-data-without.html

it does require the use of a 'heavy forwarder' to parse the data and use regular expressions to exclude data you don't want to index before it is sent to the indexer(s).

Hope that helps!

romedome · ‎02-08-2016

Yes, we're using an universal forwarder and doing the filtering on the indexer. Only 5% of the logs in the logfile are relevant.

The problem is not the filtering, though. When you rotate a 50GB file weird things start to happen as contention becomes and issue. The issue at the heart of this is that having a single file accumulate 50GB of data in a single day is a bad practice. I found a few posts stating that the upper limit of a log file is a few GB before a forwarder starts to have issues.

I got together with the application developer and the person that wrote the log rotation script and they're going to have to redesign the logging to add more structure (have separate files for unrelated events) and change the way the file is being rotated.

Thanks!

Are there best practices when indexing very large log files while they are being rotated?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!