All Apps and Add-ons

Can I configure Splunk to point to a thawedPath on S3 and have none of these paths stored on EBS?

joonoyang
Engager

Hi,

I'm designing SPLUNK AWS and thinking how long days we're able to store data within our budget. Looks like if we can take both types of storage, S3 and EBS together, by storing thawed data on s3 and remain data on EBS, I may have more retention periods. Does it make a sense?

Thanks,
Joon

0 Karma

Jeremiah
Motivator

First I think your terminology is a little off. Thawed data is data you have temporarily restored to a Splunk server to make that data searchable again. It sounds like maybe you are thinking of storing frozen data on S3? Which yes, you can do.

There's a lot of articles and answers related to the data lifecycle in Splunk. But the basics are that your data moves between phases of hot, warm, cold, and then frozen in buckets, which you can think of as just chunks of data organized by time. Hot, warm, and cold buckets are all searchable from Splunk. But once a bucket is frozen, it is no longer searchable, because it has been deleted from your Splunk server. You can choose to archive the bucket to another location at the time its frozen. For example, you could use a script to archive the bucket to S3. Then at some point in the future, if you needed to search that bucket again, you could copy it back to your Splunk server and make it searchable. This step of taking a frozen bucket and making it searchable again is called "thawing".

Here's a set of links from the docs you may want to look through:
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/HowSplunkstoresindexes
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Setaretirementandarchivingpolicy
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Automatearchiving
https://docs.splunk.com/Documentation/Splunk/6.4.2/Indexer/Restorearchiveddata

In terms of mixing different types of storage, tuning retention, and minimizing costs, you have several options with AWS. How you set things up boils down to your budget, and how searchable, performant, and durable your data must be. You can use a mix of instance storage, EBS, S3 and glacier to retain data for a very long time. From a Splunk application perspective, you can have hot/warm and cold data on different volume types, you can enable tsidx reduction to lower your storage requirements (but sacrifice search performance), and you can use Hunk to search data in EMR/S3.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...