How do I restore data from a frozen bucket?

athorat · ‎08-05-2016

We had a situation where we lost data for last quarter for one specific Index. We have 4tb HDD space on the indexers and 1.9 is being used.
Later with the help of a support engineer, we realized that that index was allocated 500gb of Space and hence the data was deleted.
(for the time being we have increased the value to 700gb for maxTotalDataSizeMB.)
As we do not have an archive path setup, we will not be able to restore the data.

In our internal testing, we restored the buckets from frozen db to thawdb in a new index and used the rebuild command.
We were able to get data back from 2010 to 2014 from the frozen db.
So my question is, if we have data from 2010 to 2014, how do I find the data for the last quarter? the data is missing only from march 10th to june 12th

Richfez · ‎08-06-2016

The simple and probably disheartening answer is "where is the data itself?"

As you found, if you can get your hands on the data as you did with the 2010-2014 stuff, you can likely figure out a way to make it searchable again. If you do not have that data Splunk can't generate it out of thin air. If you have backups from that time you could look there.

If you still have any of the original source data around you could ingest that again (checking MAX_DAYS_AGO = <integer> settings in props.conf to be sure you'll get correct timestamps since they'll be old events!). Perhaps you have backups of the syslog/whatever server's log files from during that period? You could restore those back (again after adjusting your timestamp settings as mentioned previously)

Otherwise that data may be gone for good.

dwaddle · ‎08-06-2016

Rich is completely right here. You either have the data, or you don't. Admittedly there is some grey area here. You could have the data and not realize it, or you could think you have the data but actually not. The important docs for the mechanics of thawing the data are here: http://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Restorearchiveddata

But, I think what might be most useful is to help you understand how buckets are named. When you see a bucket named something like:

db_1470164322_1470057086_10

This bucket covers a time range of Mon Aug 1 09:11:26 EDT 2016 to Tue Aug 2 14:58:42 EDT 2016. The two numbers there in the directory name are time_t values corresponding to the oldest and newest events in the bucket. So you will have to find the buckets that are named in a way that their timestamps overlap the time ranges you are seeking...

How do I restore data from a frozen bucket?

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms

Updated Team Landing Page in Splunk Observability