I'm monitoring a folder but I'm not seeing all the files getting indexed into Splunk.
Then I did
index=_internal sourcetype="splunkd" log_level="ERROR"
and found several events indicating the reason files were not indexed.
04-26-2010 11:58:04.265 ERROR TailingProcessor - Ignoring path due to: File will not be read, is too small to match seekptr checksum (file=C:\Program Files\WebSphere\profiles\AppSrv01\config\cells\sfeserv36Node01Cell\PolicySets\WSReliableMessaging persistent\PolicyTypes\WSReliableMessaging\policy.xml). Last time we saw this initcrc, filename was different. You may wish to use a CRC salt on this source. Consult the documentation or contact Splunk Support for more info.
I do not understand why Splunk is telling me that the filename was different.
Help?
Splunk performs a CRC check of the files it tries to index. The error you report implies that we had indexed a file with the same CRC value. Even if the file name is different, we will not index it unless you use the CRC salt parameter for the input. This prevents Splunk from reindexing the same log file, even though you may have renamed it.
Sometimes, if you have a file that has the same few header lines, this will confuse Splunk as we don't perform the CRC against the whole file. In those cases, you should use the crcSalt parameter:
crcSalt = <SOURCE>
If set, this string is added to the CRC. Use this setting to force Splunk to consume files that have matching CRCs. If set to crcSalt = (note: This setting is case sensitive), then the full source path is added to the CRC.
For reference:
http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories
Splunk performs a CRC check of the files it tries to index. The error you report implies that we had indexed a file with the same CRC value. Even if the file name is different, we will not index it unless you use the CRC salt parameter for the input. This prevents Splunk from reindexing the same log file, even though you may have renamed it.
Sometimes, if you have a file that has the same few header lines, this will confuse Splunk as we don't perform the CRC against the whole file. In those cases, you should use the crcSalt parameter:
crcSalt = <SOURCE>
If set, this string is added to the CRC. Use this setting to force Splunk to consume files that have matching CRCs. If set to crcSalt = (note: This setting is case sensitive), then the full source path is added to the CRC.
For reference:
http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories
Is there a way to delete the CRCs of the previous indexing activity? I deleted the index and the data input and basically tried to start over but my files won't index again.
You could either empty the fish bucket or add a random crcSalt in your inputs.conf.
Adding a salt will change the hash of the files and thus index them again.
Skalli
Just to be completely clear about this setting.... Nicholas, you received this message on an XML config file which is where adding the crcSalt
setting is helpful. But you should probably not add this to monitors that are indexing traditional log files. The danger of adding "crcSalt = <SOURCE>
" everywhere is that it would re-index a log file after it is rotated, so you could end up with the same events loaded many many times.
You can check the duplicated events along with their time of indexing with the below query:
index=your index sourcetype=your sourcetype | eval dup=_raw | convert ctime(_time) as T1 | convert ctime(_indextime) as indextime | transaction dup mvlist=t maxspan=1s keepevicted=true | table dup,source,sourcetype,host,index,indextime
Process to delete the duplicated events:
index=* sourcetype=wsa_accesslogs | eval id=_cd."|".index."|".splunk_server | transaction _raw maxspan=1s keepevicted=true mvlist=t | search
eventcount>1
| eval delete_id=mvindex(id, 1, -1) | stats c by delete_id | outputlookup delete_these.csv
Note: You need to wait till your search gets complete. You can use smart mode as well.
You can also check the newly created lookup table in the $Splunk_Home\etc\apps\app_name\lookups\ delete_these.csv
index=* sourcetype=wsa_accesslogs | eval delete_id=_cd."|".index."|".splunk_server | search [|inputlookup delete_these.csv | fields delete_id |
format "(" "(" "OR" ")" "OR" ")"] | delete
Happy Splunking
Thank you Simeon and Wolverine! It works now with crcSalt =