Getting Data In

Implement Retention Policy

a5003976
Explorer

Hi all,
our customer want to implement a policy that track logs of the last six months starting from the time in which we look.
Retention means that there is always a time window of 6 months in Splunk.

Can i implement a similar policy?

Thanks.

0 Karma

a5003976
Explorer

Thanks all for reply.
Than if i apply on my config:

 [YourIndexName]
 homePath   = $SPLUNK_DB/YourIndexName/db
 coldPath   = $SPLUNK_DB/YourIndexName/colddb
 thawedPath = $SPLUNK_DB/YourIndexName/thaweddb
 #Rollover data bucket everyday
 maxHotIdleSecs = 86400
 #Keep data searchable for 180 days
 frozenTimePeriodInSecs = 15552000

when i specify a maxium life of a host bucket (maxHotIdleSecs), and than Splunk rolls it to warm, my data is also available on Spunk? is data yet searchable?

In this case i have always latest 180 days searchable on Splunk? for example on 30 October will the logs available on Splunk until November 30?

Thanks for your patience 🙂

Thanks a lot.

0 Karma

Richfez
SplunkTrust
SplunkTrust

This isn't quite what you originally asked - if your question is how to make sure Splunk keeps at least 180 days of history (and not exactly 180 days), that's pretty easy because it's just disk space. Don't set any of the above (or, well, leave them at their defaults) and just let Splunk accumulate data.

As long as you aren't talking about an insane amount of data (definitions of which vary considerably, but in this case, I'm thinking if your expected 6 months of retention is less than 10 TB or 20 TB), this isn't very difficult. Above that, it's not hard either but it can take quite a bit more disk.

Your only parameter to adjust may just be "maxTotalDataSizeMB" per index, and that adjustment is just to make sure it big enough to store 6 months or more. You might tweak maxDataSize from "auto_high_volume" as well, but this can be changed later. Actually, BOTH of those can be changed after you get a month's worth of data in and see just how big it ends up being.

You will definitely want to implement something like woodcock posted above to keep track of how much disk space is in use and add disk if necessary before you run out of space.

0 Karma

woodcock
Esteemed Legend

The other answer and comments are all correct but there is a reason that I asked exactly what you need. It was because you need to check "both ends". If the data for your index grows so large that 60 days no longer fits, splunk will roll buckets sooner than your frozenTimePeriodInSecs value specifies and you need to know this. Whenever a bucket is rolled, there is a log that indicates this. The other answers make sure that you expire data when you should but you ALSO need to have a handle on the opposite to make sure that you are NOT expiring data when you should NOT be. To do this you need to track that the size of 6-months worth of data always ("still") fits within the filespace that you have allotted for this index. To do this, you need to setup an alert or periodic report based on this search:

index=_internal sourcetype=splunkd bucketmover "will attempt to freeze" | rex field=_raw "/splunkdata(?:/[^/]*)?/(?<indexname>[^/]*)/db/db_(?<newestTime>[^_]*)_(?<oldestTime>[^_]*)_.*" | dedup indexname | eval retentionDays=(now()-oldestTime)/(60*60*24) | stats values(retentionDays) as retentionDays by indexname

Richfez
SplunkTrust
SplunkTrust

Good points.

0 Karma

somesoni2
Revered Legend

The data retention applies at bucket level (minimum timestamp value stored within bucket) so, to keep the data for a 6 months/180 days, you should roll over the bucket (from hot to warm) every day. Something like this should work fine

[YourIndexName]
homePath   = $SPLUNK_DB/YourIndexName/db
coldPath   = $SPLUNK_DB/YourIndexName/colddb
thawedPath = $SPLUNK_DB/YourIndexName/thaweddb
#Rollover data bucket everyday
maxHotIdleSecs = 86400
#Keep data searchable for 180 days
frozenTimePeriodInSecs = 15552000

a5003976
Explorer

Hi somesoni2,
thanks for reply. But in this case everyday bucket rolled, in this case i have always evailable data searchable for latest 6 months. Customer wants a windows of events available for 6 months.

Example:
If today is 13 April logs must be available until 13 October, tomorrow that is 14 April logs should be available on Splunk until 14 October...and so on!
This should work for a window of 6 months.

Thanks.

0 Karma

Richfez
SplunkTrust
SplunkTrust

a5003976,

Your request is indeed the solution somesoni2 supplied. Don't worry about buckets rolling over from hot to warm or whatever, somesoni2 is completely right in his choice of this parameter:

[YourIndexName]
frozenTimePeriodInSecs = 15552000

That will make data older than that many seconds disappear and go away. That's 180 days, though, NOT 6 months. To get 6 months precisely, you'll need two things. First, roll data to cold in 15811200 seconds (which is 183 days - That's 6 months minimum, but may have 1 extra day in it depending on when it's run because of the different number of days in different months), then also prepend earliest=-6mon to the dashboards and reports. The earliest parameter will trim it to a precise, beautiful 6 months.

[YourIndexName]
frozenTimePeriodInSecs = 15811200

Then in your searches, add this:

mysearch earliest=-6mon sourcetype=companyA | my other search stuff...

So what's that get you, time-wise?

The OLDEST data they will see, as of today, October 25th, would be April 25th.

The oldest data they COULD see, if they could change the search, would be 183 days, or April 25th as well. Again if they could change the search, at other times of the year they could remove the earliest parameter and perhaps see 6 months + 1 more day. Probably not an issue, and if it is I'd look seriously into telling them to change the policy. 🙂

And, if they want to change the time picker to last 30 days to see more detail, they could and it would only show them 30 days.

But when they change it back to "All time" it would show only back to April 25th.

0 Karma

woodcock
Esteemed Legend

Do you mean you would like to:

A: keep logs at least 6 months (no shorter)
B: keep logs exactly 6 months (no longer)
C: Report on current retention so that you can make adjustments if you need to.

Really "C" is going to be the way to go in any case, I think.

0 Karma

a5003976
Explorer

Hi wookcock,
customer wants logs available on Splunk (searchable via web) of exactly latest 6 months.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...