I will be feeding in 10 GB per day to 2 splunk indexers (clustered environment)
Replication Factor = 2
Searchable Factor = 2
How to estimate the storage size for index data on each indexer?
Assuming data retention policy for search will be around for 1 year.
We can estimate to an extent, but it will depend on a variety of factors:
We'll make a conservative estimate, assuming that after compression and TSIDX creation your data will be 75% of its original size - and we'll also assume for the time being you will not have any summary or acceleration data...
10GB * 365 days * .75 = 2.8T of space before replication. With ideal load balancing across indexers, each should use 1.4T of space before clustering. Your RF=2/SF=2 clustering across two indexers will mean that each indexer will need 2X that storage, so you'll need 2.8T of storage per indexer.
I would include some extra bytes for filesystem overhead, and other things like your _internal indexes and round it up to 3T.
The only assumption here which is really hard to validate is whether or not your data post-indexing will be 75% of the raw size. For typical IT data, this is a pretty conservative estimate and should leave you some wiggle room. But the only way you'll know for sure is to take say a 1GB sample of your logs and see what they wind up needing space-wise once indexed - then you can adjust the 75% up or down as needed.
GOOD POINT! Each indexer should be getting 5GB/day which is then duplicated 2x to 10GB/day. DERP. I fixed the math. Thanks @martin_mueller!
While I agree with most of your calculations, I'd have one difference to ponder: If you have a cluster of 2 with SF of 2, each indexer should be storing 100% of the incoming data - not 200%.
As a result, I'd expect 1.4T per indexer before replication (load balancing forwarders) and 2.8T per indexer after replication