Solved: How to figure out index disk space footprint on in...

EdgarAllenProse · ‎03-13-2017

I was struggling to find short and long term estimations on how much space was taken by each index in each state, so if you are trying to make a plan or taking over an older deployment your 2 friends are dbinspect and the Monitoring Console. Seriously try to avoid internal metrics with splunkd unless you are looking at license volume.

Monitoring Console deserves a course on it's own, but using dbinspect, I was able to find - by bucket and state - volumes compressed and uncompressed, while figuring out a decent estimation of how I should configure my indexes.conf.

The search used (which desperately needs cleaning, as it has plenty of unnecessary stats tables written into it):

| dbinspect index=* 
| search tsidxState="full" bucketId=*
    | eval ageDays=round((endEpoch-startEpoch)/84000,10)
| stats min(startEpoch) as MinStartTime max(startEpoch) as MaxStartTime min(endEpoch) as MinEndTime max(endEpoch) as MaxEndTime max(hostCount) as MaxHosts max(sourceTypeCount) as MaxSourceTypes sum(eventCount) as TotalEvents sum(rawSize) as rawSizeBytes sum(sizeOnDiskMB) as sizeOnDiskBytes values(ageDays) as ageDays dc(bucketId) as countBuckets by index bucketId, state 
    | where ageDays<90 AND ageDays>0.0000000000 
    | eval sizeOnDiskBytes=round(sizeOnDiskBytes*pow(1024,2))
    | eval dailyDisk=round(sizeOnDiskBytes/ageDays,5)
    | eval dailyRaw=round(rawSizeBytes/ageDays,5)
    | eval dailyEventCount=round(TotalEvents/ageDays)
| table index bucketId state dailyDisk ageDays rawSizeBytes, sizeOnDiskBytes TotalEvents PercentSizeReduction dailyRaw dailyEventCount ageDays
| stats sum(dailyDisk) as dailyBDiskBucket, values(ageDays), sum(dailyRaw) as dailyBRaw sum(dailyEventCount) as dailyEvent, avg(dailyDisk) as dailyBDiskAvg, avg(dailyRaw) as dailyBRawAvg, avg(dailyEventCount) as dailyEventAvg, dc(bucketId) as countBucket by index, state, ageDays
    | eval bPerEvent=round(dailyBDiskBucket/dailyEvent)
    | eval bPerEventRaw=round(dailyBRaw/dailyEvent)
| table dailyBDiskBucket index ageDays dailyEvent bPerEvent dailyBRaw bPerEventRaw state
    | sort ageDays
| stats sum(dailyBDiskBucket) as Vol_totDBSize, avg(dailyBDiskBucket) as Vol_avgDailyIndexed, max(dailyBDiskBucket) as Vol_largestVolBucket, avg(dailyEvent) as avgEventsPerDay, avg(bPerEvent) as Vol_avgVolPerEvent, avg(dailyBRaw) as Vol_avgDailyRawVol, avg(bPerEventRaw) as Vol_avgVolPerRawEvent, range(ageDays) as rangeAge by index, state
    | foreach Vol_* [eval <<FIELD>>=if(<<FIELD>> >= pow(1024,3), tostring(round(<<FIELD>>/pow(1024,3),3))+ " GB", if(<<FIELD>> >= pow(1024,2), tostring(round(<<FIELD>>/pow(1024,2),3))+ " MB", if(<<FIELD>> >= pow(1024,1), tostring(round(<<FIELD>>/pow(1024,2),3))+ " KB", tostring(round(<<FIELD>>)) + " bytes")))]
    | rename Vol_* as *
    | eval comb="Index Avg/day: " + avgDailyIndexed + "," + "Raw Avg/day: " + avgDailyRawVol + "," + "DB Size: " + totDBSize + "," + "Per Event Avg/Vol: " + avgVolPerEvent + "," + "Retention Range: " + tostring(round(rangeAge))
    | eval comb = split(comb,",")
| xyseries index state comb
| table index hot warm cold

This search helped a lot in knowing where to move forward in configuration changes. Hopefully this helps you avoid the trip into wonderland. The main are you want to look is Index Avg/day that's the compressed value; what is written to disk.

EdgarAllenProse · ‎03-13-2017

Oh! Also, make sure the search is run all time and if you are running an indexer cluster run the query on the master, otherwise run it on the indexer.

View solution in original post

EdgarAllenProse · ‎03-13-2017

Oh! Also, make sure the search is run all time and if you are running an indexer cluster run the query on the master, otherwise run it on the indexer.

How to figure out index disk space footprint on indexers?

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!