Getting Data In

Fast way to find size of events on disk for a time period

hajducko
Explorer

Often times, we are tasked with deleting data out of an index to trim it down. Generally, we do this by setting the frozenTimePeriodInSecs and allowing Splunk to remove old data.

However, before we do that, we need to figure out where the best bang for the buck is. Many times people just ask us for the oldest event and try to plan it around there, but this is usually a very small bit of the data ( some old remnants of events that somehow got indexed way later ). In order to figure out what the earliest time is that there is a large chunk of data at, we've been doing something

index=myindex | timechart count span=1w

And running that across All Time. The problem is that it's slow, especially for our bigger customers, who are often in the 1 billion event range for a few months of data. Plus, we're just doing event counts, rather than getting real sizes of the data ( or even the compressed on disk size ). I've played with using len(_raw) and summing it up, but it just made things even slower.

Is there a quicker way to do this? I've played with dbinspect but that doesn't give the breakdown we'd need. I've also thought about doing this as an ongoing search with a scripted input that runs daily to get the index sizes on disk, but I realized that this doesn't give us a picture of what Splunk looks right now. The second we delete any data, all that historical data becomes obsolete. Ideally, I'd like to get a current size of index, per index/per day.

Anyone have any ideas?

Tags (2)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I think you're actually on the right track with dbinspect... it's just not that easy to read first time round. Consider this query:

| dbinspect index=_internal | table index id state eventCount sizeOnDiskMB modTime startEpoch endEpoch | sort + endEpoch | eval age = now()-endEpoch | eval humanAge = tostring(age, "duration") | streamstats sum(sizeOnDiskMB) as recoverableDiskMB

That'll generate a table like this:

alt text

Using this table, you can see what bucket(s) would be next in line to be deleted according to the frozen period and when, as well as how much space would be recovered. If your task were to clear up 50MB of space you'd see that all buckets until 393/394 would have to go, giving you a desirable frozen time of about 1200000 according to the age column.

View solution in original post

V_at_Splunk
Splunk Employee
Splunk Employee

Hi hajducko,

If you get no joy with Search Language, consider a Perl or Python script. Given that Splunk stores LT/ET (event latest time/earliest time) both per index and per bucket (details vary by release, and as you haven't given your Splunk version I cannot advise), you should be able to get approximations quickly, and without tying up the host's resources much.

Well OK, the per-bucket details haven't changed across releases: a bucket called db_1397097601_1396910519_45 contains events with LT=1397097601 (the first long number) and ET=1396910519 (the second long number). Both LT and ET are epoch seconds.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I think you're actually on the right track with dbinspect... it's just not that easy to read first time round. Consider this query:

| dbinspect index=_internal | table index id state eventCount sizeOnDiskMB modTime startEpoch endEpoch | sort + endEpoch | eval age = now()-endEpoch | eval humanAge = tostring(age, "duration") | streamstats sum(sizeOnDiskMB) as recoverableDiskMB

That'll generate a table like this:

alt text

Using this table, you can see what bucket(s) would be next in line to be deleted according to the frozen period and when, as well as how much space would be recovered. If your task were to clear up 50MB of space you'd see that all buckets until 393/394 would have to go, giving you a desirable frozen time of about 1200000 according to the age column.

hajducko
Explorer

Well, one of the issues was that these are on Splunk 5. It does appear to work better w/ Splunk 6. I'm guessing I'll shoot the results over to a summary index on each indexer and then search off that.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Well, the overlap is inherent regardless of how you determine the frozen period because Splunk will only delete whole buckets once the latest event has grown old enough.

Getting that to work in a distributed search environment may be tricky indeed... but there must be a workable way somehow. If all else fails, one could write a custom search command that gets the list of search peers from /services/search/distributed/peers and runs one search each directly on those, merges the results along with a splunk_server field, and returns the whole shebang.

0 Karma

hajducko
Explorer

Hrm, well, not exactly what I wanted but it does give a general sense of how far you need to go back to get a decent size of data, which was the intention of the search. The only issue being that the line items aren't really a 'per day' and there can be quite a bit of overlap for many of the buckets.

The other problem being that dbinspect only works at the indexer level with no way to distribute it to the indexers ( that I'm aware of ) besides an app/add-on.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...