Getting Data In

How does splunk handle *nix logrotate based log rotation?

abonuccelli_spl
Splunk Employee
Splunk Employee

Hi,

what will happen if I use splunk to index files apache or syslog which gets rotated to *.gz?

will the data be reprocessed?

What is the default behaviour on 5?

I've found a couple of old answers

http://answers.splunk.com/answers/10309/log-file-rotation
http://answers.splunk.com/answers/12729/will-splunk-re-index-a-log-file-if-i-compress-it-after-its-b...

but I'm not entirely sure about actual behaviour on Splunk 5:

Tags (2)
1 Solution

abonuccelli_spl
Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

View solution in original post

abonuccelli_spl
Splunk Employee
Splunk Employee

Splunk will not re-index already processed files after they get gzipped.

example using a default monitor stanza like this:

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ pwd
/opt/SPLUNK/5.0.5/splunk
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ./bin/splunk btool inputs list monitor:///var/log/apache2
[monitor:///var/log/apache2]
_rcvbuf = 1572864
disabled = false
followTail = 0
host = linux-test-host
index = default
sourcetype = access_combined

for a folder like this:


user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ ls -alrth /var/log/apache2
total 9.9M
drwxr-xr-x 18 root root 4.0K Feb 4 16:42 ..
-rw-r--r-- 1 root root 355 Feb 4 16:42 error.log.5.gz
-rw-r--r-- 1 root root 33K Feb 4 16:58 other_vhosts_access.log.5.gz
-rw-rw-rw- 1 root adm 353 Feb 4 16:58 error.log.4.gz
-rw-rw-rw- 1 root adm 1.7K Feb 4 16:59 other_vhosts_access.log.4.gz
-rw-rw-rw- 1 root adm 355 Feb 4 16:59 error.log.3.gz
-rw-rw-rw- 1 root adm 2.3K Feb 4 17:00 other_vhosts_access.log.3.gz
-rw-rw-rw- 1 root adm 353 Feb 4 17:00 error.log.2.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:02 other_vhosts_access.log.2.gz
-rw-rw-rw- 1 root adm 354 Feb 4 17:02 error.log.1.gz
-rw-rw-rw- 1 root adm 11K Feb 4 17:04 other_vhosts_access.log.1.gz
-rw-rw-rw- 1 root adm 280 Feb 4 17:04 error.log
drwxr-x--- 2 root adm 4.0K Feb 4 17:04 .
-rw-rw-rw- 1 root adm 9.8M Feb 4 17:09 other_vhosts_access.log

user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ cp var/log/apache2/other_vhosts_access.log.1.gz /tmp/
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ gunzip -d /tmp/other_vhosts_access.log.1.gz
user@linux-test-host /opt/SPLUNK/5.0.5/splunk $ wc -l /tmp/other_vhosts_access.log.1
28644 /tmp/other_vhosts_access.log.1

user@linux-test-host ./bin/splunk search "source=/var/log/apache2* | stats count by source"

source count

/var/log/apache2/error.log 2
/var/log/apache2/error.log.1.gz 4
/var/log/apache2/error.log.2.gz 4
/var/log/apache2/error.log.3.gz 4
/var/log/apache2/error.log.4.gz 4
/var/log/apache2/error.log.5.gz 4
/var/log/apache2/other_vhosts_access.log 90875
/var/log/apache2/other_vhosts_access.log.1.gz 28644
/var/log/apache2/other_vhosts_access.log.2.gz 28517
/var/log/apache2/other_vhosts_access.log.3.gz 5341
/var/log/apache2/other_vhosts_access.log.4.gz 3732
/var/log/apache2/other_vhosts_access.log.5.gz 84227

The above tests were done starting a condition where there 0 files in the folder with several logrotation cycle run manually -> logrotate --force /etc/logrotate.d/apache2

When rotation happens,Splunk will find a compressed file which was already processed as non-compressed ( or compressed if starting from folder empty) and will behave like below, from splunkd.log

02-04-2014 17:28:39.513 +0000 INFO ArchiveProcessor - Archive with path="/var/log/apache2/other_vhosts_access.log.1.gz" was already indexed as a non-archive, skipping.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...