Hi,
I've mounted some NFS and nfs locally to Splunk some files I want to monitor remotely.
Problem is files are continuously re-indexed and I am getting duplicated events.
In the logs I can see this is continuously re-indexed:
02-21-2014 14:16:17.914 +0200 INFO WatchedFile - Will begin reading at offset=0 for file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 14:16:38.557 +0200 INFO BatchReader - Removed from queue file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 14:23:52.077 +0200 INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 14:23:52.078 +0200 INFO WatchedFile - Will begin reading at offset=0 for file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 14:24:11.745 +0200 INFO BatchReader - Removed from queue file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:01:17.065 +0200 INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:01:17.065 +0200 INFO WatchedFile - Will begin reading at offset=0 for file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:01:37.700 +0200 INFO BatchReader - Removed from queue file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:07:51.908 +0200 INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:07:51.908 +0200 INFO WatchedFile - Will begin reading at offset=0 for file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:08:12.253 +0200 INFO BatchReader - Removed from queue file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:13:41.283 +0200 INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/mnt/nfs-mounted-share/app1/logs/main.log'.
02-21-2014 15:13:41.284 +0200 INFO WatchedFile - Will begin reading at offset=0 for file='/mnt/nfs-mounted-share/app1/logs/main.log'.
By typing 'mount' I can see /mnt/nfs-mounted-share is mounted with options:
//192.168.123.234/logs on /mnt/nfs-mounted-share/app1/logs type nfs (rw,relatime,username=john,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.123.234,file_mode=0755,dir_mode=0755,nounix,rsize=61440,wsize=65536,actimeo=1)
I went on the server where the file is hosted and verified the file has't rotated!
(4ff7ceef43b51e85487a8f6bf6a00062 == 4ff7ceef43b51e85487a8f6bf6a00062)
$ while true; do date; echo 'InitCrc:'; head -1 /opt/app1/logs/main.log|md5sum; echo 'SeekCrc:'; tail -1 /opt/app1/logs/main.log | md5sum; echo ''; sleep 60; done
Tue Apr 8 15:00:48 BST 2014
InitCrc:
4ff7ceef43b51e85487a8f6bf6a00062 -
SeekCrc:
90e3811d7d2a94d16d1b43a6b470f407 -
Tue Apr 8 15:01:48 BST 2014
InitCrc:
4ff7ceef43b51e85487a8f6bf6a00062 -
SeekCrc:
b3837abba7bbda9c912b6c16fef20f23 -
Tue Apr 8 15:02:48 BST 2014
InitCrc:
4ff7ceef43b51e85487a8f6bf6a00062 -
SeekCrc:
dcffef3590881f5fbad551869d2d140d -
Tue Apr 8 15:03:48 BST 2014
InitCrc:
4ff7ceef43b51e85487a8f6bf6a00062 -
SeekCrc:
87d93c46410f4dbf900e58ae28370983 -
Tue Apr 8 15:04:48 BST 2014
InitCrc:
4ff7ceef43b51e85487a8f6bf6a00062 -
SeekCrc:
6e90a42e914f3a07b967de219f26864b -
Tue Apr 8 15:05:48 BST 2014
InitCrc:
4ff7ceef43b51e85487a8f6bf6a00062 -
SeekCrc:
6e90a42e914f3a07b967de219f26864b -
Same thing is happening on another setup with CIFS mount!
What is happening?
Most likely filesystem attributes are caching file attributes which will tamper Splunk ability to detect actual CRCs values for initptr/seekptr/modtime.
In a healthy scenario, SeekPtr(scrc) matches what Splunk has saved in fishbucket:
"03-24-2014 17:08:05.226 +0000 DEBUG WatchedFile - seeking /mnt/nfs-mounted-share/app1/logs/main.log to off=37624"
"03-24-2014 17:08:05.227 +0000 DEBUG WatchedFile - Reached EOF: fname=/mnt/nfs-mounted-share/app1/logs/main.log fishstate=key=0xa8ad07cc1926ac37 **sptr**=37722 **scrc**=0x9a0b880ed85b3357 **fnamecrc**=0x794111686abf86a7 **modtime**=1395680878"
"03-24-2014 17:08:08.221 +0000 DEBUG TailingProcessor - Deferred notification for path='/mnt/nfs-mounted-share/app1/logs/main.log'."
"03-24-2014 17:08:08.221 +0000 DEBUG TailingProcessor - Will attempt to read file: /mnt/nfs-mounted-share/app1/logs/main.log from existing fd."
"03-24-2014 17:08:08.226 +0000 DEBUG TailingProcessor - About to read data (Reusing existing fd for file='/mnt/nfs-mounted-share/app1/logs/main.log')."
In a fs-cached scenario, what happens is:
Splunk compares X which is different from MD5(seekptr(150)),says:
"INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file"
and re-index the whole file.
Options specified in /etc/fstab are often ignored or not honoured depending on what version of various related components are in use.
The correct way to check what options are being used is to run mount on client side
$ mount | egrep logs
nfsserver:/logs on /mnt/nfs-mounted-share/logs type nfs (rw,noexec,nosuid,nodev,noac,vers=4,addr=192.168.32.23,clientaddr=0.0.0.0)
when using NFS,
Options like actimeo=1 should not be used clientside:
acregmin=n The minimum time (in seconds) that the NFS client caches
attributes of a regular file before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 3-second minimum.
acregmax=n The maximum time (in seconds) that the NFS client caches
attributes of a regular file before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 60-second maximum.
acdirmin=n The minimum time (in seconds) that the NFS client caches
attributes of a directory before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 30-second minimum.
acdirmax=n The maximum time (in seconds) that the NFS client caches
attributes of a directory before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 60-second maximum.
actimeo=n Using actimeo sets all of acregmin, acregmax, acdirmin,
and acdirmax to the same value. If this option is not
specified, the NFS client uses the defaults for each of
these options listed above.
http://manpages.ubuntu.com/manpages/hardy/man5/nfs.5.html
To disable caching nfs clientside 'noac' option instead should be enforced.
Most likely filesystem attributes are caching file attributes which will tamper Splunk ability to detect actual CRCs values for initptr/seekptr/modtime.
In a healthy scenario, SeekPtr(scrc) matches what Splunk has saved in fishbucket:
"03-24-2014 17:08:05.226 +0000 DEBUG WatchedFile - seeking /mnt/nfs-mounted-share/app1/logs/main.log to off=37624"
"03-24-2014 17:08:05.227 +0000 DEBUG WatchedFile - Reached EOF: fname=/mnt/nfs-mounted-share/app1/logs/main.log fishstate=key=0xa8ad07cc1926ac37 **sptr**=37722 **scrc**=0x9a0b880ed85b3357 **fnamecrc**=0x794111686abf86a7 **modtime**=1395680878"
"03-24-2014 17:08:08.221 +0000 DEBUG TailingProcessor - Deferred notification for path='/mnt/nfs-mounted-share/app1/logs/main.log'."
"03-24-2014 17:08:08.221 +0000 DEBUG TailingProcessor - Will attempt to read file: /mnt/nfs-mounted-share/app1/logs/main.log from existing fd."
"03-24-2014 17:08:08.226 +0000 DEBUG TailingProcessor - About to read data (Reusing existing fd for file='/mnt/nfs-mounted-share/app1/logs/main.log')."
In a fs-cached scenario, what happens is:
Splunk compares X which is different from MD5(seekptr(150)),says:
"INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file"
and re-index the whole file.
Options specified in /etc/fstab are often ignored or not honoured depending on what version of various related components are in use.
The correct way to check what options are being used is to run mount on client side
$ mount | egrep logs
nfsserver:/logs on /mnt/nfs-mounted-share/logs type nfs (rw,noexec,nosuid,nodev,noac,vers=4,addr=192.168.32.23,clientaddr=0.0.0.0)
when using NFS,
Options like actimeo=1 should not be used clientside:
acregmin=n The minimum time (in seconds) that the NFS client caches
attributes of a regular file before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 3-second minimum.
acregmax=n The maximum time (in seconds) that the NFS client caches
attributes of a regular file before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 60-second maximum.
acdirmin=n The minimum time (in seconds) that the NFS client caches
attributes of a directory before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 30-second minimum.
acdirmax=n The maximum time (in seconds) that the NFS client caches
attributes of a directory before it requests fresh
attribute information from a server. If this option is
not specified, the NFS client uses a 60-second maximum.
actimeo=n Using actimeo sets all of acregmin, acregmax, acdirmin,
and acdirmax to the same value. If this option is not
specified, the NFS client uses the defaults for each of
these options listed above.
http://manpages.ubuntu.com/manpages/hardy/man5/nfs.5.html
To disable caching nfs clientside 'noac' option instead should be enforced.
I am running in the same issue. But we have the logs written on the CIFS share which are mounted onto Linux servers. I am running Splunk UF on these servers, that are monitoring the logs from this mount. we are seeing a great delay in the events being searchable (~3 hours delay until the data is searchable)
Not sure where the delay is happening, either on Splunk UF monitoring the files or Indexing by indexers. can you please advise what can be done in this scenario?
CIFS ---> Mount on Linux ---> Splunk UF ---> indexers ---> Data Searchable in SH