We recently had to move our splunk installation & indexes to a new AWS instance, which was somewhat complicated due to the size of the indexes. Since then most of the indexes are updating correctly, but our most important custom index is not.
Part way thru the move, indexing was restarted, then stopped. When the data move was complete, we had bucket id conflicts. We followed all the instructions we could find to correct the issues, renaming all the conflicting buckets, and all indexes and metadata was rebuilt. (splunk _internal call /data/indexes/*/rebuild-metadata-and-manifests). Other affected indexes are now working correctly, but our most important index has not processed farther than Dec 4. We get these errors in the logs files about every second or so:
12-23-2013 02:47:14.427 +0000 ERROR BTree - 133th child has invalid offset: indexsize=32434216 recordsize=77042296, (Leaf)
12-23-2013 02:47:14.427 +0000 ERROR BTreeCP - addUpdate CheckValidException caught: BTree::Exception: Validation failed in checkpoint
We have tried repairing the buckets and metadata several times. Splunk found errors and repaired them, but the BTREE error continued. We've stopped and restarted Splunk a number of times to retest, and new repairs were made to the buckets each time. One problematic bucket has been moved into /root -- it refused to be repaired.
None of this affected the BTREE error. The data still isn't showing up the Splunk web interface when we run searches.
What other things can we try to repair this index? I have not seen any other reports of a similar error message when I search thru answers.splunk.com.
... View more