Getting Data In

CSV Monitoring issues

pkeller
Contributor

[monitor:///home/paul/training_status/]
whitelist = (.csv$|.CSV$)
blacklist = .filepart$
index=training_index
sourcetype=training_status
crcSalt = &ltSOURCE&gt

The file gets updated once per week. In many cases, the file is not being fully consumed. The most recent update missed 19 records (which were consumed the last time the file was updated )

Splunkd.log shows:

04-06-2017 07:39:19.584 -0700 INFO WatchedFile - Will begin reading at offset=4234 for file='/home/paul/training_status/filename.csv

So, my uneducated guess would be that splunkd is seeing data that it's already consumed and thus ignoring those 19 records before it starts ingesting.

How do I prevent this? I thought setting crcSalt=&ltSOURCE&gt was supposed to handle this.

Thank you.

Tags (2)
0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

crcSalt=<SOURCE> instructs splunk to use the entire filepath and name, in addition to the first 256 bytes, to determine if it has already indexed a file. If you are not changing the filename, then splunk will start indexing wherever it left off (or wherever the data is changed).

If you want the same records to be consumed again each time the file is updated, then the easy ways are (A) put a timestamp on the file name, (B) add an update timestamp column to each row of the csv, or (C) add a timestamp to the header in the file.

Alternately, assuming the "source of record" for the file is someplace safe, you could have splunk delete the file when it is finished indexing, so that any file found will be "new".

View solution in original post

DalJeanis
SplunkTrust
SplunkTrust

crcSalt=<SOURCE> instructs splunk to use the entire filepath and name, in addition to the first 256 bytes, to determine if it has already indexed a file. If you are not changing the filename, then splunk will start indexing wherever it left off (or wherever the data is changed).

If you want the same records to be consumed again each time the file is updated, then the easy ways are (A) put a timestamp on the file name, (B) add an update timestamp column to each row of the csv, or (C) add a timestamp to the header in the file.

Alternately, assuming the "source of record" for the file is someplace safe, you could have splunk delete the file when it is finished indexing, so that any file found will be "new".

pkeller
Contributor

Thank you. This makes things very clear. - Cheers

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...