Getting Data In

What happens with Seek Address if I append new data to a log file(.csv) and save it while Splunk is indexing the file?

kevinvrb
Engager

Hey all,

I have a daily .csv log file that gets updated with new info every time another app finishes some jobs. I'm getting some duplicate events indexed by splunk from such files and after reading a lot about how Splunk handles log files, I suspect that the file is updated while splunk is still indexing new lines from it's latest version, for example:

File August-1.csv has the following lines at 10:00:00 A.M:

1
2
3

File August-1.csv has the following lines at 10:00:01 A.M:

1
2
3
4
5

File August-1.csv has the following lines at 10:00:03 A.M:

1
2
3
4
5
6
7
8

If Splunk is indexing the new lines(4, 5) from 10:00:01 A.M file, doesn't finish by 10:00:03 A.M, does it saves the Seek Address at "5" and starts indexing new lines (6, 7, 😎 from 10:00:03 A.M version, or is the current indexing cancelled and indexing of 10:00:03 A.M version starts at last known Seek Address which was "3"?

Thanks for any help!

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Splunk is pretty good at keeping track of where it left off reading files that get appended. It keeps a pointer of where it left off, along with some checksums to detect non-append changes or log rotation.

Reindexing usually is caused by something making a change to a monitored file that isn't an append, e.g. changing something at the beginning of the file. Make sure your various applications only add stuff to the end of the file. If you can't do that, make the applications write the file somewhere / under some name, and move/rename the file to a path monitored by splunk after you've finished updating it.

For additional info there should be an entry in splunkd.log in index=_internal, mentioning reasons for starting over with a monitored file.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...