Getting Data In

recommended index sizes

awurster
Contributor

hi guys -

i have a stand-alone splunk server that i'm trying to size appropriately. we have a fixed 3TB volume to work with.

i am wondering how large or small to make the various indexes, especially the built-in ones: summary, _internal, etc.

it seems like the default sizes would theoretically allow for overrun on the volume (500,000 MB). so i guess my questions are:

1 - can / should we resize the internal indexes (i.e. _internal, history, _audit) to be more aware of the given storage volumes?
2 - what percentage should we reserve for summary indexing? 25% of desired index (and/or main)?

cheers,

andrew

0 Karma
1 Solution

bmacias84
Champion

That all depends on your requirements for the data stored in your indices. Also how do you have your storage broken for your hot, warm, cold, forzen, archive buckets. Will you be summary indexing all your data, how will that be broken out, hourly,daily, weekly, monthly? Do you have retention/security policy for certain data sources/types?

This varies dramaticly depending on your requirements.

The 500,000 MB is how large your index is across all buckets ( HOT, WARM, COLD).

Additional Reading:

HowSplunkstoresindexes

Setaretirementandarchivingpolicy

Setupmultipleindexes

Howindexingworks

EstimateIndexSize <--Splunk Wiki on how to perform estimations.

View solution in original post

0 Karma

bmacias84
Champion

That all depends on your requirements for the data stored in your indices. Also how do you have your storage broken for your hot, warm, cold, forzen, archive buckets. Will you be summary indexing all your data, how will that be broken out, hourly,daily, weekly, monthly? Do you have retention/security policy for certain data sources/types?

This varies dramaticly depending on your requirements.

The 500,000 MB is how large your index is across all buckets ( HOT, WARM, COLD).

Additional Reading:

HowSplunkstoresindexes

Setaretirementandarchivingpolicy

Setupmultipleindexes

Howindexingworks

EstimateIndexSize <--Splunk Wiki on how to perform estimations.

0 Karma

bmacias84
Champion

In most cases the data will have rolled to Frozen and deleted before the Max DB size is approached. Make sure if you modify your indexes.conf buckets may rollover and cause data to be deleted.

0 Karma

awurster
Contributor

thank you. so i think it's fair to say that the sum of all your indexes should ideally not exceed the size of your available disk space / volume(s). it seems very unlikely for the internal indexes and so on to really use up much space, however your main / primary indexes should never exceed 100% of available space - perhaps even 90 or 95% is better.
i'm somewhat comparing this to when you partition new disk(s) during an initial OS install (i.e. swap, home, os, etc). the installation process in most cases won't let you allocate more than 100%.

0 Karma

bmacias84
Champion

In my env I have different types of storage for HOT(LOCAL SSD), WARM (TIER 2 SAN), COLD (TIER 3 SAN). In the end it comes down to knowing your data and configureing indexes based on retention/security/importance. Configuring Settings like maxHotSpanSecs(upper bound of timespan for Hotbuckets), maxHotIdleSecs(Maxlife of hotbucket). Hope this helps.

0 Karma

bmacias84
Champion

@awurster, "what happens when an indexer runs out of space on disk?" Your indexers will pause (stop indexing) which has a potentional for data loss. You can minimize possible data loss by using indexer acknowldgement, increasing input and output queueSize for streamed data sources. _internal or summary_indexes are just indexes and will have the same rules and will be paused. Once disk space issue has been resolve you indexer will continue indexing. An indexer pausing occurrs at 2000MB free diskspace by default. http://docs.splunk.com/Documentation/Splunk/5.0/Indexer/Setlimitsondiskusage

awurster
Contributor

thanks.

i guess in that case my question is more towards "what happens when an indexer runs out of space on disk?" and then "if something like main or another regular index fills up - what happens to retention of data in other key places like _internal or summary?"

just want to avoid any disasters once the disk fills up.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...