Skip cold and move from warm to frozen/deleted?

dmr1 · ‎12-01-2014

I apologize for what I think is probably going to end up being a pretty dumb question, but I don't have a lot of experience with the internals of Splunk.

I think I understand pretty well why the hot->warm->cold->frozen path for data exists and how it's useful in a traditional SAN storage environment where you might lots of different types of disk and the data can be cleanly moved from one to the other. This is less clear to me in the clustered environment. If you're throwing a lot of indexing servers at the problem, each with their own disk, I don't quite understand the need to have cold buckets and suffer the copy from hot->cold.

As a hypothetical example, if you have 10 servers that have 1000 IOPs each, I don't think I see the benefit to carving out like 3,000 IOPs for cold and 7,000 IOPs for hot when it's all the same "pool" of disk anyway. I'd rather just have all my data spread evenly across the physical disk. Even if I put the hot and cold buckets on the same physical disk, when the buckets roll to cold, I have to pay a big IO penalty when I'd rather be using the IOPs for indexing new data or processing search requests.

So I guess I have three questions:

Is this even a question that makes sense? Do I have a misunderstanding of some fundamental concept?
If it does make sense, is it possible to skip cold buckets entirely and move from hot to warm to frozen/deleted?
If it's not possible, what can be done to reduce the impact of the copy from warm to cold as as much as possible?

sowings · ‎07-25-2016

Just make one big partition. Warm to cold is the first opportunity to change partition (filesystem, etc) but doing so is not required.

Lucas_K · ‎12-01-2014

The "point" of having the different places is that it gives you the possibility of having different tiers of storage performance.

Not everyone can have multi tb's of data on expensive fast disks. Sometimes people still want historic data for legal retention reasons. This way you can have your quick data AND long term storage that is accessible when ever you want (just a little slower).

Personally i'd wish they had an additional hot volume storage definition allowing for the separation of hot and warm aswell!

You should be setting your retention times and index sizes to fit your storage policies.

http://docs.splunk.com/Documentation/Splunk/latest/admin/Indexesconf

lguinn2 · ‎12-01-2014

For one thing, you aren't carving up the IOPS - only the physical space. Although I understand that you don't want to waste your I/O capacity on just copying data around. Your data will be spread evenly across the physical disk if you put both the db and the colddb directories on the same disk volume. I like doing this, because then the "roll" from warm to cold is a simple inode move (on Linux) and not a copy operation. So Splunk has already optimized this.

You could configure the db space to be as large as the max size of the index. Then the colddb directory would be unused. I don't like this. For one thing, separating the older (cold) data from newer data generally improves search speed. Also, it makes it easier to do backups. I prefer to size db so that it holds approximately 30 days of data. (This assumes that you are planning to keep 6 - 12 months of data online in Splunk.)

Based on Splunk's experience, the majority of searches examine only the last 24 hours of data, and about 80% examine 30 days or less. Keeping the db directory of warm buckets sized for about 30 days means that - for most searches - Splunk needs to look only in the single directory.

Finally, having a strategy for warm vs. cold means that I can someday, if I must, easily move the cold data to another volume. This could happen if my index grows a lot. Although I often find that I need to create new indexes for new kinds of data - and I can choose to put different indexes on different volumes...

I hope this helps. You might want to take a look at this wiki article as well: Things I wish I knew then

Skip cold and move from warm to frozen/deleted?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!