We had a disk failure on our indexer. During this time, Splunk was thinking it was indexing data. We had to stop splunk, remount the disk, and start it again. However, the period that the disk went offline (containing one of our indexes) we now have a gap were we don't have any events.
The logs are still available on the application servers and they run universal forwarders.
I want to re-index just the missing 3 hour time period. If I push the whole log via one shot (containing events before and after the disk outage), I will get duplicate events as I would if I deleted the _fishbucket on the forwarders. This is production data.
What are my options in this instance?
Thanks
Something that more selective than deleting the entire _fishbucket
is using the btprobe
command:
splunk cmd btprobe -d SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private-_db --file <source> --reset
You can read more about btprobe here.
Please see @YannK 's answer here as well.
How many files are involved in that 3 hour window? Are they all within a single file? I guess hypothetically you could just parse out the portion you want to reindex, and just reindex that one section? Slightly less than desirable I'm sure though 😛
Thanks for the reply.
Yes, so the problem is that every host has at least 16 logs that need to be done and we have around 30-40 hosts that we are really interested in.
I will investigate btprobe and report back.