Getting Data In

Storage experts: With 20 SSDs per indexer, what's the best RAID option?

twinspop
Influencer

These will be running SUSE 12. Each SSD will be 1.6TB. The systems have hardware RAID cards, but I'm tempted to go with JBOD, and use Linux tools or even ZFS to manage the volumes.

  • RAID50? eg, RAID5 with 5 members, 4 groups, striped
  • RAID60?
  • Multiple RAIDZ1 or -2 with ZFS?

Our storage group recommended one giant RAID5 volume, which worries me. Rebuild on a volume that size seems to be a problem, and losing a second drive during rebuild would be a real possibility. Not to mention having 1 drive failure protection in a 20 drive array seems like a bad idea.

EDIT - I'm trying to avoid RAID10, losing 50% of the raw storage.

0 Karma
1 Solution

masonmorales
Influencer

We use RAID5 on our indexers, which are 20x 1.92 TB SSDs. Rebuild time is ~4 hours or so in our environment, but that depends on whether you are using hardware vs software RAID, CPU speed, etc. We are also in an indexer cluster, so we can afford an indexer being down for a rebuild that will take several hours.

For the file system, performance-wise there is no difference. We use XFS.

Are you going to be clustering your indexers? If so, there's really no reason not to go with RAID 5.

If you are in a non-clustered environment, RAID50 would work fine as well.

View solution in original post

twinspop
Influencer

Follow-up: RAID5 was okay at first, but the relatively poor IO perf caught up with us. Eventually I had to re-create the volumes as RAID10. SmartStore made this fairly easy. We just updated these servers and went with fewer drives in RAID0, relying on remote storage (S2) and clustering for all redundancy.

0 Karma

masonmorales
Influencer

If you're interested in performance differences, you can check out the .Conf 2016 talk I did, "Architecting Splunk for Epic Performance at Blizzard Entertainment" at https://conf.splunk.com/sessions/2016-sessions.html

0 Karma

masonmorales
Influencer

We use RAID5 on our indexers, which are 20x 1.92 TB SSDs. Rebuild time is ~4 hours or so in our environment, but that depends on whether you are using hardware vs software RAID, CPU speed, etc. We are also in an indexer cluster, so we can afford an indexer being down for a rebuild that will take several hours.

For the file system, performance-wise there is no difference. We use XFS.

Are you going to be clustering your indexers? If so, there's really no reason not to go with RAID 5.

If you are in a non-clustered environment, RAID50 would work fine as well.

twinspop
Influencer

We are clustered. Currently 5 (in 2 different clusters). Soon to be 12 each. Thanks for your input!

0 Karma

masonmorales
Influencer

What's your RF/SF?

0 Karma

twinspop
Influencer

For this project we plan to be RF3/SF2.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...