Getting Data In

Archival on Indexer

mandarpimplapur
Explorer

I have a setup running of Splink 5.x with default config as shown below:

maxDataSize = auto_high_volume (i.e. 10GB)
maxHotSpanSecs = 7776000 ( 90 Days)
maxTotalDataSizeMB = 500000 (i.e. 500 GB)
frozenTimePeriodInSecs = 31104000 (i.e. 1Yrs default is 6Yrs)

Also having number of Indexers like audit, internal, history, summary and main indexer.
Out of which main indexer having index size 1.09 TB and continuously increasing.

So, how can we archived the main indexer using the above parameters, can we controlled over it.

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

Hi mandarpimplapure,

This will apply per index. TO say that if you have 10 indexes with max 500GB each, then you could grow to 5TB WITHOUT REPLICATION. If you are in a cluster and you have replication, you need to calculate disk usage differently.

This is a great document for setting up your retention:
http://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Setaretirementandarchivingpolicy

This is a great formula for calculating your disk space needs PER INDEX in a CLUSTERED environment:
(Original Data Size * 0.35 * Replication Factor) + (Original Data Size * 0.15 * Search Factor) = Total Disk Consumption

There is also an great website for this:
https://splunk-sizing.appspot.com/

Best of luck!

View solution in original post

jkat54
SplunkTrust
SplunkTrust

Hi mandarpimplapure,

This will apply per index. TO say that if you have 10 indexes with max 500GB each, then you could grow to 5TB WITHOUT REPLICATION. If you are in a cluster and you have replication, you need to calculate disk usage differently.

This is a great document for setting up your retention:
http://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Setaretirementandarchivingpolicy

This is a great formula for calculating your disk space needs PER INDEX in a CLUSTERED environment:
(Original Data Size * 0.35 * Replication Factor) + (Original Data Size * 0.15 * Search Factor) = Total Disk Consumption

There is also an great website for this:
https://splunk-sizing.appspot.com/

Best of luck!

jkat54
SplunkTrust
SplunkTrust

Thanks for the upvote @mandarpimplapure!

If you feel this answer solves the problem, please mark it as the answer by clicking the link below the answer.

0 Karma

mandarpimplapur
Explorer

Thanks for you inputs, Also I have some other query:

Configurations:
maxDataSize = auto_high_volume
maxHotSpanSecs = 90 days
maxTotalDataSizeMB = 1.5 TB
frozenTimePeriodInSecs = 1 Yrs

Space taken by the buckets = (maxWarmDBCount + maxHotBuckets ) * maxDataSize =(300 + 10 )*10GB = 3100GB for auto_high_volume

Space taken by the cold buckets = maxTotalDataSizeMB - "size of the hot+warm buckets" = 1100GB - 3100GB = - 2000GB for auto_high_volume (There is no cold bucket exists)

The current main index size is 1.1TB and keep increasing with having warm buckets of 12GB size each. The main index contains data from last 5 years.

How we can purge these data from main indexer, also the maxDataSize is set to 10GB then why the buckets are created with 12GB data ?

Can someone kindly provide the correct archival policy as well as the setting for the indexers as well buckets.

0 Karma

Richfez
SplunkTrust
SplunkTrust

"Correct" is only how you define it. What exactly do you want done? "Purging data" is easy enough, but how much? I assume keep 1 year?

jkat54 was right on the money - this has to be done on EACH index. Once changed you have to restart Splunk to make the changes take effect, but when you do they will take effect immediately. Also note "frozenTimePeriodInSecs" is in seconds, "1Yrs" won't work. But if you just meant you'd figure that out and put in the right number, well, that's fine, I just wanted to double-check.

I'd not worry much about hot/warm and bucket sizes and whatnot, those honestly are details that probably don't matter much in your case. It's worth noting that it's wise usually to make ONE change at a time and confirm it worked correctly before moving on to the next change. If the bucket sizes and so on still bother you after you've got your retention straightened out, mark this question as answered and create a new question for that issue.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...