Can I configure Splunk to point to a thawedPath on...

joonoyang · ‎08-11-2016

Hi,

I'm designing SPLUNK AWS and thinking how long days we're able to store data within our budget. Looks like if we can take both types of storage, S3 and EBS together, by storing thawed data on s3 and remain data on EBS, I may have more retention periods. Does it make a sense?

Thanks,
Joon

Jeremiah · ‎08-13-2016

First I think your terminology is a little off. Thawed data is data you have temporarily restored to a Splunk server to make that data searchable again. It sounds like maybe you are thinking of storing frozen data on S3? Which yes, you can do.

There's a lot of articles and answers related to the data lifecycle in Splunk. But the basics are that your data moves between phases of hot, warm, cold, and then frozen in buckets, which you can think of as just chunks of data organized by time. Hot, warm, and cold buckets are all searchable from Splunk. But once a bucket is frozen, it is no longer searchable, because it has been deleted from your Splunk server. You can choose to archive the bucket to another location at the time its frozen. For example, you could use a script to archive the bucket to S3. Then at some point in the future, if you needed to search that bucket again, you could copy it back to your Splunk server and make it searchable. This step of taking a frozen bucket and making it searchable again is called "thawing".

Here's a set of links from the docs you may want to look through:
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/HowSplunkstoresindexes
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Setaretirementandarchivingpolicy
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Automatearchiving
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Restorearchiveddata

In terms of mixing different types of storage, tuning retention, and minimizing costs, you have several options with AWS. How you set things up boils down to your budget, and how searchable, performant, and durable your data must be. You can use a mix of instance storage, EBS, S3 and glacier to retain data for a very long time. From a Splunk application perspective, you can have hot/warm and cold data on different volume types, you can enable tsidx reduction to lower your storage requirements (but sacrifice search performance), and you can use Hunk to search data in EMR/S3.

Can I configure Splunk to point to a thawedPath on S3 and have none of these paths stored on EBS?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!