Knowledge Management

Filesystem for Splunk

cvajs
Contributor

so, under a rhel6 and latest splunk, and likely sitting on 16 10k spindles raid-5, is there a filesystem best suited for the job given the fact that both syslog-ng files and splunk indexes are on the same filesystem?

Tags (1)
0 Karma

dwaddle
SplunkTrust
SplunkTrust

(started as a comment, but couldn't fit in 500 chars... sorry)

A very quick read suggests that the cachecade may not provide the desired boost to Splunk's workload. Per Dell's documentation it is a read cache only. It may not help substantially with the RAID-5 performance penalty for less than full-stripe writes. Splunk deals well with cold buckets on RAID-5 (because cold buckets are read-only), but much less so with hot - because there is a lot of random writes to update both rawdata and index tsidx data files.

To optimize for performance and capacity, my personal preference would be to put all 16 drives in the R720, and use 8 in a (4+4) RAID-10 for the hot buckets and 8 in a (7+P) RAID-5 for cold buckets / operating system. Use Linux LVM to help sort out filesystems appropriately. If you really want to use the cachecade, put it on the RAID-5 volume and let the RAID-10 be dealt with using operating system filesystem cache in RAM.

cvajs
Contributor

so, 2.4T raid-10 for hot buckets, and 5+P+cachecade (need one spare) 3T raid-5 for OS and cold buckets. problem is, i have syslog on same system and i expect 60/40 write:read ratio between syslog writing and Splunk reading those files, so i suspect syslog has to go on raid-10. 2.4T wont be enough for my syslog data. i also plan to do some ext4 filesystem tuning, like noatime, data=ordered, nouser_xattr, which saves some writes.

0 Karma

lguinn2
Legend

What dwaddle said!! Thanks dude 🙂

0 Karma

lguinn2
Legend

First and foremost - NOT RAID-5

Raid 10 (1+0) would be much much better. Normal RHEL filesystem s/b fine.

Very good info in first part of Installation manual, and in wiki:

http://wiki.splunk.com/Community:HardwareTuningFactors

http://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements

cvajs
Contributor

if i run crude calc using 1/3 read and 2/3 writes in raid-5 w/ 1k IOPS as my target for operating IOPS, i get something like this:
(1000*0.333)+(1000*0.666*4)
333+2664 = 2997 raw array IOPS to realize 1000 in raid-5

14*130*cachecade factor = 1820*3.5 = 6370

so, probably not 20k, but if the OS can "see" 6370 IOPS in a raid-5 split 33% read and 66% write, does my original statement still hold, does it really matter that its raid-5?

0 Karma

cvajs
Contributor

in this specific case i am referencing Dell's R720 with one SSD CacheCade, tests ran in raid-10 using 15k drives yielded 32k+ IOPS. its a skewed # due to CacheCade and it all depends. 20k was my estimate based on same tests using 10k drives.

0 Karma

reed_kelly
Contributor

How can you get 20000+ IOPS from 10K spindles? Did you mean SSD?

0 Karma

cvajs
Contributor

thanks for the info, but i am not 100% convinced that raid-5 is a bad choice. raid-5 is certainly not better than a flavor of raid-10, but if a 14 spindle raid-5 array offers ~20000+ IOPS does it really matter that it's a raid-5? the cons for flavor of raid-10 is that you lose half the storage capacity. pros and cons to each, but i am not ready to say raid-5 is a bad choice.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...