Knowledge Management

It is correct to back up all the buckets all days?

lightech1
Path Finder

hello everyone!!

I have a customer that backup the whole buckets structure (hot-warm-cold) all days. (yes, its horrible!) up to 400 GB per day. I was reading those links in order to advice they with another alternatives:

http://wiki.splunk.com/Deploy:BucketRotationAndRetention
https://docs.splunk.com/Documentation/Splunk/6.6.2/Admin/Indexesconf
https://answers.splunk.com/answers/8730/how-to-manage-the-size-of-my-indexes-to-fit-in-my-volumes.ht...

a little of context of the infraestructure:

they have 2 indexers. The configuration of all the buckets is the default but only I set the parameter "frozenTimePeriodInSecs" to 1 year.

i Have read that if I set the auto_high_volume parameter on indexes (high-volume) in order to have less .db files, it will increase the hot , warm buckets and it will never create cold bucktets.

questions:

It is necessary to backup all the buckets?
I could save event data with a certain time on the buckets? in order to determine when I will backup, for example: the warm buckets will backup once a week..

Any additional advice will be appreciated

Thank you !!

Leonardo.

Tags (1)
0 Karma
1 Solution

adonio
Ultra Champion

hello @lightech1,
first and foremost, read the link @ddrillic provided in detail.
a comment on your phrase: "i Have read that if I set the auto_high_volume parameter on indexes (high-volume) in order to have less .db files, it will increase the hot , warm buckets and it will never create cold buckets."
this is very much untrue. read in detail explanations about indexes.conf here:
https://docs.splunk.com/Documentation/Splunk/6.6.2/Admin/Indexesconf
short version, this is a parameter to maxDataSize, it sets the max size of a bucket to 10gb and is recommended to apply it on indexes that grows ore than 10gb per day. otherwise, set to "auto"
now, to your questions.
is it necessary to backup all the buckets? well, this depends on your client and what do they need exactly, however, consider the fact that once a bucket rolled from hot to warm, no changes will apply to it anymore, therefore, backing it up is good today, but will be the exact same file / directory, tomorrow, and probably the day after too.
regardless, discuss with your client their RPT and RTO, here are links that explains those terms:
https://en.wikipedia.org/wiki/Recovery_time_objective
https://en.wikipedia.org/wiki/Recovery_point_objective
second question / statement, what is the goal in backing up the warm buckets in a set rule? your default indexes configurations tells me that the default number of warm buckets is 300 (read again in indexes.conf, link above, about maxWarmDBCount )
in fact, with settings like auto_high_volume and default maxWarmDBCount, i will guess that most of your clients bucket are indeed warm today.
imho, have your client describe to you exactly what they would like to achieve, what is their RTO and RPO, and then you can decide on a backup strategy. one more option, depends on the platform youre using, is to have snapshots, read more here: https://en.wikipedia.org/wiki/Snapshot_(computer_storage)
faster and less time and space consuming. in the past i was also able to recover couple of systems using snapshot pretty smoothly.
apologies for the long answer
hope it helps

View solution in original post

0 Karma

DalJeanis
Legend

Okay... It seems to me like you and your client really need to chat with someone about what the purposes and methods of backing things up are.

In general, the purpose of a backup is to allow you to recreate the situation as it was at a prior moment in time.

If a chunk of data is not changing, then you only need a single backup, not a daily backup, of that data.

There are two kinds of backups, full backups and incremental backups. A full backup assumes that you will have no other information about the data and will be starting from scratch. An incremental backup assumes that you have the data as of a certain date, and therefore will only need the data that has changed. (In splunk, that means you would only need any new buckets created or modified since the prior backup.

There are two locations for backups (loosely speaking) - onsite and offsite. Onsite backups are kept close and handy for minor problems such as loss of a hard drive. Offsite backups are kept safe and distant for major problems such as loss of an entire data center to a hard rain or a zombie horde. Nowadays, "the cloud" is generally considered offsite from everywhere.

Given the above general information, you need to set up backups to meet the needs of your customer. Daily, backing up all buckets that have changed to the cloud is a reasonable approach which, with regard to splunk data, gives you a full backup. A backup system can be set up to toggle backups so that you have a full backup that is one day old and a second backup that is two days old, and your daily backup updates the two-day-old files with any files altered in the last 48 hours. Should be much faster than a complete backup, while leaving you with a complete backup.

Your mileage may vary, so make sure to discuss the data security and other characteristics of the solution with your client.

0 Karma

lightech1
Path Finder

hello daIJeanis,

Maybe You dont understant my question!

Thanks you anyway!

0 Karma

adonio
Ultra Champion

hello @lightech1,
first and foremost, read the link @ddrillic provided in detail.
a comment on your phrase: "i Have read that if I set the auto_high_volume parameter on indexes (high-volume) in order to have less .db files, it will increase the hot , warm buckets and it will never create cold buckets."
this is very much untrue. read in detail explanations about indexes.conf here:
https://docs.splunk.com/Documentation/Splunk/6.6.2/Admin/Indexesconf
short version, this is a parameter to maxDataSize, it sets the max size of a bucket to 10gb and is recommended to apply it on indexes that grows ore than 10gb per day. otherwise, set to "auto"
now, to your questions.
is it necessary to backup all the buckets? well, this depends on your client and what do they need exactly, however, consider the fact that once a bucket rolled from hot to warm, no changes will apply to it anymore, therefore, backing it up is good today, but will be the exact same file / directory, tomorrow, and probably the day after too.
regardless, discuss with your client their RPT and RTO, here are links that explains those terms:
https://en.wikipedia.org/wiki/Recovery_time_objective
https://en.wikipedia.org/wiki/Recovery_point_objective
second question / statement, what is the goal in backing up the warm buckets in a set rule? your default indexes configurations tells me that the default number of warm buckets is 300 (read again in indexes.conf, link above, about maxWarmDBCount )
in fact, with settings like auto_high_volume and default maxWarmDBCount, i will guess that most of your clients bucket are indeed warm today.
imho, have your client describe to you exactly what they would like to achieve, what is their RTO and RPO, and then you can decide on a backup strategy. one more option, depends on the platform youre using, is to have snapshots, read more here: https://en.wikipedia.org/wiki/Snapshot_(computer_storage)
faster and less time and space consuming. in the past i was also able to recover couple of systems using snapshot pretty smoothly.
apologies for the long answer
hope it helps

0 Karma

lightech1
Path Finder

hello adonio,

thanks for your reply.

If i only backup the warm buckets, that it seems that covers the most data, I will lost (in case of an emergency) the data that is in hot buckets.

Maybe I could set the "maxHotSpanSecs = " to 1 day for example to convert after 24 hs a hot bucket into a warm bucket, so If a disruption exist I will lost only one day of data (because I backup all the warm buckets). I dont know if could be a good opcion, what do you think?

The client need to have 1 year events online.

Thanks.

0 Karma

adonio
Ultra Champion

you can do that, yes, but be very careful when applying that settings. read in indexec.conf.spec as to why
links are in the answer above.
also, how often would you like to run a backup? if its every 24 hours and you set buckets to roll every 24 hours, i recommend to set the backup to run after the buckets roll. lets say buckets roll at 00:00, have the back up job start at 00:15 or 00:30 and include only warm and cold buckets
good luck!

0 Karma

lightech1
Path Finder

sorry, my mistake. I was thinking about to set the parameter "maxHotSpanSecs to 3 days maybe, And do the backups of the warm buckets every 3 days for example.

thanks!

0 Karma

lightech1
Path Finder

another comment:
It seems that the parameter "maxHotSpanSecs" apply for hot/warm buckets, I see on the indexes.conf:

maxHotSpanSecs =

* Upper bound of timespan of hot/warm buckets in seconds.

So if i set maxHotSpanSecs= (24 hours) , the hot and warm buckets will only save 1 day ???? because it that case It isnt good the idea.

question 2: how I can know when the buckets roll?

thanks in advance!!

0 Karma

adonio
Ultra Champion

it only applies to the hot buckets, it tells splunk to calculate how long a bucket is open for data.
only hot buckets are open for new data and therefore, warm and cold data size and retention are a function of that settings...
it means that even if the bucket is not full, it will roll after 24 hours
back to the question, can you focus more on exactly what it is that your client is trying to solve? what is clients RTO and RPO.
also see @DalJeanes answer regarding backup types, incremental backup, and full backup

0 Karma

ddrillic
Ultra Champion

The following can probably assist you - Back up indexed data

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...