Why data that was searchable yesterday can no long...

robertpenberthy · ‎11-06-2014

We're currently running Splunk Enterprise 6.1.2.

A few times in the past few months, we've run into a problem where the data we've had in the index has disappeared. It's frustrating and need to prevent it from happening again. The first two times was on our server indexing logs from our test environments. The third time it happened on our server indexing the logs from production environments.

First time we experienced this issue, we had over a year's worth of data in the index and the next day we couldn't search back before the weekend. The index was no where near its max size and the size of the index after the data went missing was less than it was previously.

The second time happened a couple of months later. We went from that couple of months of data down to about two weeks of data.

Today's occurrence with the missing production data has made this an immediate concern. We had a Engineer resuming his research of events from Oct 27th and the data he was looking at yesterday is missing today. The index was at near capacity when I checked after the Engineer reported the issue of missing data to me. However, the index has events going back all the way to Oct 6th.

For this particular index, we have a suite of applications from many different hosts forwarding their data. A number of these hosts and applications we can pull up their indexed data going back to Oct 6th, but for this application one application that had customers reporting issues... we cannot pull up the data prior to late in the day Oct 27th.

Why/How is this happening?

I've tried searching for an answer to this issue and cannot find any. I've perused the _audit index for anything that looks like it would explain the missing data, but do not find anything. What else can I look for to explain why this happened and hopefully prevent it from happening or controlling when it happens?

Thanks in advance for any of your suggestions/answers to my plea for help!

Update:

The following was pulled from the splunkd.log file and do not know if they does or does not account for the issue I reported:

11-06-2014 10:24:01.992 -0600 WARN  BucketMover - Unexpected failure to parse bucket='/opt/splunk/var/lib/splunk/bc/db/hot_v1_814'
11-06-2014 10:24:01.992 -0600 WARN  BucketMover - Unexpected failure to parse bucket='/opt/splunk/var/lib/splunk/bc/db/hot_v1_815'
11-06-2014 10:24:01.992 -0600 WARN  BucketMover - Unexpected failure to parse bucket='/opt/splunk/var/lib/splunk/bc/db/hot_v1_816'
11-06-2014 10:24:01.992 -0600 INFO  DbMaxSizeManager - Moving up to 8 hot+warm buckets, start from oldest by LT, until achieve compliance (size: current=2097258496 (2000MB,1GB) max=2097152000 (2000MB,1GB))
11-06-2014 10:24:01.997 -0600 INFO  DbMaxSizeManager - Will chill bucket=/opt/splunk/var/lib/splunk/bc/db/db_1415124594_1415087509_806 LT=1415124594 size=399175680 (380MB)
11-06-2014 10:24:02.015 -0600 INFO  BucketMover - idx=bc Moving bucket='db_1415124594_1415087509_806' because maximum number of warm databases exceeded, starting warm_to_cold: from='/opt/splunk/var/lib/splunk/bc/db' to='/storage/splunk/bc/colddb'
11-06-2014 10:24:02.015 -0600 INFO  BucketMover - idx=bc bucket=db_1415124594_1415087509_806 Firing async chiller: from='/opt/splunk/var/lib/splunk/bc/db' to='/storage/splunk/bc/colddb'
11-06-2014 10:24:02.015 -0600 INFO  DbMaxSizeManager - Bucket moved successfully (size: cur=1698082816 (1619MB,1GB), max=2097152000 (2000MB,1GB))
11-06-2014 10:24:13.689 -0600 INFO  DatabaseDirectoryManager - Writing a bucket manifest in hotWarmPath='/opt/splunk/var/lib/splunk/bc/db'.  Reason='Updating bucket, bid=bc~806~5C35B09D-9D10-405D-B658-C20C93219352'

weeb · ‎03-09-2015

I have seen a few of these issues recently. The fix required an increase the default number of allowed max warm buckets.

This is not always the magic fix, but if the customer has made changes to configuration files which result in a high number of warm buckets created, this will might be the fix. Often such a state indicates incorrect index configuration, as by default Splunk organizes and optimizes the number of buckets and sizing, and we do not recommend tuning these parameters.

Support recommends using the parameter maxVolumeDataSizeMB to control total volume sizing instead.

http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/indexesconf

maxWarmDBCount =

• The maximum number of warm buckets.
• Warm buckets are located in the for the index.
• If set to zero, it will not retain any warm buckets (will roll them to cold as soon as it can)
• Defaults to 300.
• Highest legal value is 4294967295

Customers have reported success with increasing it to 3000, but as stated, take a closer took at the parameters set in indexes.conf.

I would recommend against trying to explicitly control number of buckets or sizing of buckets as this would be likely to negatively impact performance.

Why data that was searchable yesterday can no longer be found today?

Splunk Custom Visualizations App End of Life

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk