Getting Data In

How well does a indexer configured w/ Raid 5 or 6 perform?

Chris_R_
Splunk Employee
Splunk Employee

If we have an indexer configured w/a raid 5 or raid 6 array is this going to negatively affect performance?

Tags (1)
1 Solution

jrodman
Splunk Employee
Splunk Employee

Raid 6 is also parity-based. It just has two disks worth of parity, so that a double drive failure leaves you with your data. This means that all writes to raid 6 will cause writes to two disks in addition to the data writes. This sounds practically worse, but the primary problem with partity is the additional read. For every time you want to write out data, you must calculate partity with all the other data in the stripe, which possibly means fetching that data off disk. This means you pay with forced-sequenced I/O (very slow) or very large memory cacheing (which could go to enhancing your real workload). Typically you'll pay with some mix of both.

So in short, raid 5 and raid 6 should have the same type of performance costs. In fact, real-world raid 5 typically features a few hot-standby drives, so the terms are quite blurry here. Traditional raid 5 is unacceptably unreliable for most business needs. A drive failure during raid rebuild causes total data loss, and is statistically very likely after the first drive failure. Thus most people mean a modified implementation of raid 5 in most circumstances, of which raid 6 is one.

Raid 5/6/5+ can provide acceptable performance/cost tradeoffs for Splunk when the I/O load is simply not that high. However if there is any chance of the I/O load becoming high enough to tax the supplied storage subsystem, something in the RAID 10, 01, 0+1, 1+0 family will provide drastically superior results.

View solution in original post

dragmore
Explorer

Hi. I have similar question. Im building a new splunk setup for an ISP with a 10gig splunk license. As u can see the dataamount isnt that big but we want to have enough read performance for the NOC. What im suggesting is the following:

2xSLC SSDS = 256GB Striped where we use for example 20GB as HOT, and 220GB as WARM 14xSAS Disks in RAID10 for COLD data. We aim to having ca 2 years of data stored on the server. In addition we have backup over ISCSI.

Whats ure comments on this aproach?

best regards TE

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Raid 5/6-type storage is usually good for cold storage. The cold DB is usually written only when buckets are rolled, so you do not incur the writing penalty that you would on the hot DB. Read-only performance of raid 5/6 is almost comparable to raid 0/10/01, assuming that the underlying disk and and disk interfaces systems are the same, and that number of disks correspond (e.g., raid 5 over 5 disks vs raid 6 over 6 disks vs raid 0 over 4 disks vs raid 10 over 8 disks)

For hot disk performance, as k8to says.

While raid 10 is superior to raid 5 for performance and fault-tolerance, this suggests that if your total storage is large enough (more than, say, 1 or 2 TB per indexer) and flexible enough, you can achieve moderate savings in disks with only a little performance loss by placing about 100 GB per index on raid 10 for the hot DB, and the remainder on raid 5 for the cold DB.

jrodman
Splunk Employee
Splunk Employee

Raid 6 is also parity-based. It just has two disks worth of parity, so that a double drive failure leaves you with your data. This means that all writes to raid 6 will cause writes to two disks in addition to the data writes. This sounds practically worse, but the primary problem with partity is the additional read. For every time you want to write out data, you must calculate partity with all the other data in the stripe, which possibly means fetching that data off disk. This means you pay with forced-sequenced I/O (very slow) or very large memory cacheing (which could go to enhancing your real workload). Typically you'll pay with some mix of both.

So in short, raid 5 and raid 6 should have the same type of performance costs. In fact, real-world raid 5 typically features a few hot-standby drives, so the terms are quite blurry here. Traditional raid 5 is unacceptably unreliable for most business needs. A drive failure during raid rebuild causes total data loss, and is statistically very likely after the first drive failure. Thus most people mean a modified implementation of raid 5 in most circumstances, of which raid 6 is one.

Raid 5/6/5+ can provide acceptable performance/cost tradeoffs for Splunk when the I/O load is simply not that high. However if there is any chance of the I/O load becoming high enough to tax the supplied storage subsystem, something in the RAID 10, 01, 0+1, 1+0 family will provide drastically superior results.

Chris_R_
Splunk Employee
Splunk Employee

Generally on busy indexer's it's not a good idea to go Raid 5 because of the extra work parity checking does on each write. If high performance is your main concern consider going Raid 10(raid 0+1) RAID 0, 01 instead of Raid 5
Our documentation has further info on harware spec

http://www.splunk.com/wiki/Community:HardwareTuningFactors

Raid 6 may be as much as of a concern as Raid 5 though i have not found many cases using Raid 6 and hopefully someone else can chime in here...

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...