Getting Data In

Filename was different, therefore source is not indexed. Why?

Nicholas_Key
Splunk Employee
Splunk Employee

I'm monitoring a folder but I'm not seeing all the files getting indexed into Splunk.

Then I did

index=_internal sourcetype="splunkd" log_level="ERROR"

and found several events indicating the reason files were not indexed.

04-26-2010 11:58:04.265 ERROR TailingProcessor - Ignoring path due to: File will not be read, is too small to match seekptr checksum (file=C:\Program Files\WebSphere\profiles\AppSrv01\config\cells\sfeserv36Node01Cell\PolicySets\WSReliableMessaging persistent\PolicyTypes\WSReliableMessaging\policy.xml).  Last time we saw this initcrc, filename was different.  You may wish to use a CRC salt on this source.  Consult the documentation or contact Splunk Support for more info.

I do not understand why Splunk is telling me that the filename was different.

Help?

Tags (2)
1 Solution

Simeon
Splunk Employee
Splunk Employee

Splunk performs a CRC check of the files it tries to index. The error you report implies that we had indexed a file with the same CRC value. Even if the file name is different, we will not index it unless you use the CRC salt parameter for the input. This prevents Splunk from reindexing the same log file, even though you may have renamed it.

Sometimes, if you have a file that has the same few header lines, this will confuse Splunk as we don't perform the CRC against the whole file. In those cases, you should use the crcSalt parameter:

crcSalt = <SOURCE>

If set, this string is added to the CRC. Use this setting to force Splunk to consume files that have matching CRCs. If set to crcSalt = (note: This setting is case sensitive), then the full source path is added to the CRC.

For reference:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories

View solution in original post

Simeon
Splunk Employee
Splunk Employee

Splunk performs a CRC check of the files it tries to index. The error you report implies that we had indexed a file with the same CRC value. Even if the file name is different, we will not index it unless you use the CRC salt parameter for the input. This prevents Splunk from reindexing the same log file, even though you may have renamed it.

Sometimes, if you have a file that has the same few header lines, this will confuse Splunk as we don't perform the CRC against the whole file. In those cases, you should use the crcSalt parameter:

crcSalt = <SOURCE>

If set, this string is added to the CRC. Use this setting to force Splunk to consume files that have matching CRCs. If set to crcSalt = (note: This setting is case sensitive), then the full source path is added to the CRC.

For reference:

http://docs.splunk.com/Documentation/Splunk/5.0/Data/Monitorfilesanddirectories

_jgpm_
Communicator

Is there a way to delete the CRCs of the previous indexing activity? I deleted the index and the data input and basically tried to start over but my files won't index again.

0 Karma

skalliger
SplunkTrust
SplunkTrust

You could either empty the fish bucket or add a random crcSalt in your inputs.conf.
Adding a salt will change the hash of the files and thus index them again.

Skalli

0 Karma

Lowell
Super Champion

Just to be completely clear about this setting.... Nicholas, you received this message on an XML config file which is where adding the crcSalt setting is helpful. But you should probably not add this to monitors that are indexing traditional log files. The danger of adding "crcSalt = <SOURCE>" everywhere is that it would re-index a log file after it is rotated, so you could end up with the same events loaded many many times.

puneethgowda
Communicator

You can check the duplicated events along with their time of indexing with the below query:

index=your index sourcetype=your sourcetype | eval dup=_raw | convert ctime(_time) as T1 | convert ctime(_indextime) as indextime | transaction dup mvlist=t maxspan=1s keepevicted=true | table dup,source,sourcetype,host,index,indextime

Process to delete the duplicated events:

  1. Run the following command to store all duplicate events in a lookup table.

index=* sourcetype=wsa_accesslogs | eval id=_cd."|".index."|".splunk_server | transaction _raw maxspan=1s keepevicted=true mvlist=t | search

eventcount>1
| eval delete_id=mvindex(id, 1, -1) | stats c by delete_id | outputlookup delete_these.csv

  1. Once search finishes completely by running the following command you can view the events stored in lookup table | inputlookup delete_these.csv

Note: You need to wait till your search gets complete. You can use smart mode as well.
You can also check the newly created lookup table in the $Splunk_Home\etc\apps\app_name\lookups\ delete_these.csv

  1. Run the following command to delete all events from source type which also exists into lookup table (in your case its delete_these.csv)

index=* sourcetype=wsa_accesslogs | eval delete_id=_cd."|".index."|".splunk_server | search [|inputlookup delete_these.csv | fields delete_id |

format "(" "(" "OR" ")" "OR" ")"] | delete

Happy Splunking

0 Karma

Nicholas_Key
Splunk Employee
Splunk Employee

Thank you Simeon and Wolverine! It works now with crcSalt =

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...