Getting Data In

Would monitoring files with logrotate and delayed compression cause reindexing?

hettervik
Builder

If I'm monitoring files that are being rotated with an added timestamp, and the rotated files are being compressed after a couple of days, could this cause reindexing of log events?

I know that Splunk supports reading compressed files, and that as long as you don't add crcSalt=<SOURCE>, log-rotating with a timestamp would not cause reindexing. However, the doc states that adding data to a compressed file would in fact cause reindexing (link). This confuses me. If Splunk decompresses files to read the checksum (to check if the log file have already been indexed or not), why could adding data to a compressed file cause reindexing? If Splunk doesn't read checksums in that way for compressed files, how can we be sure normal rotated log files with delayed compression can't cause reindexing as well?

Hope someone can explain this to me. 🙂

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Hi @ hettervi,

Splunk supports log rotation with checksum of file, in your scenario when log file will be rotated and appended with timestamp in this case splunk will not reindex whole log file until and unless you use crcSalt = <SOURCE>.

For compressed rotated log file in this case I'll suggest you to use whitelist parameter with Regular Expression in monitor stanza in inputs.conf to monitor only current and rotated file but not compressed file because those rotated file already checked and indexed (if required) by splunk.

0 Karma

hettervik
Builder

Hi. I know that Splunk will not re-index rotated log files because of the checksum, I'm also aware that I can blacklist the compressed files, but then what's the point of keeping them? The whole idea of keeping, say, a week of rotated files, is that if Splunk or the network goes down, I have a whole week to notice and get it back up before loosing data. If I blacklist the compressed folders, I won't have a week of on-disk log files for Splunk to read anymore.

What I'm wondering is exactly how Splunk calculates and checks checksums of compressed folders, and in which scenarios compression of monitored log files could cause re-indexation.

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Generally it should not re-index compressed file, I have tested in my lab environment and compressed file didn't re-index. However looking at other thread https://answers.splunk.com/answers/223263/why-is-a-gz-file-created-by-log-rotation-indexed-a.html, it looks likes due to race condition splunk might re-index file again but in general I'll suggest to blacklist compressed file and whenever require manually uncompress it and index those files.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...