Getting Data In

How to figure out index disk space footprint on indexers?

EdgarAllenProse
Path Finder

I was struggling to find short and long term estimations on how much space was taken by each index in each state, so if you are trying to make a plan or taking over an older deployment your 2 friends are dbinspect and the Monitoring Console. Seriously try to avoid internal metrics with splunkd unless you are looking at license volume.

Monitoring Console deserves a course on it's own, but using dbinspect, I was able to find - by bucket and state - volumes compressed and uncompressed, while figuring out a decent estimation of how I should configure my indexes.conf.

The search used (which desperately needs cleaning, as it has plenty of unnecessary stats tables written into it):

| dbinspect index=* 
| search tsidxState="full" bucketId=*
    | eval ageDays=round((endEpoch-startEpoch)/84000,10)
| stats min(startEpoch) as MinStartTime max(startEpoch) as MaxStartTime min(endEpoch) as MinEndTime max(endEpoch) as MaxEndTime max(hostCount) as MaxHosts max(sourceTypeCount) as MaxSourceTypes sum(eventCount) as TotalEvents sum(rawSize) as rawSizeBytes sum(sizeOnDiskMB) as sizeOnDiskBytes values(ageDays) as ageDays dc(bucketId) as countBuckets by index bucketId, state 
    | where ageDays<90 AND ageDays>0.0000000000 
    | eval sizeOnDiskBytes=round(sizeOnDiskBytes*pow(1024,2))
    | eval dailyDisk=round(sizeOnDiskBytes/ageDays,5)
    | eval dailyRaw=round(rawSizeBytes/ageDays,5)
    | eval dailyEventCount=round(TotalEvents/ageDays)
| table index bucketId state dailyDisk ageDays rawSizeBytes, sizeOnDiskBytes TotalEvents PercentSizeReduction dailyRaw dailyEventCount ageDays
| stats sum(dailyDisk) as dailyBDiskBucket, values(ageDays), sum(dailyRaw) as dailyBRaw sum(dailyEventCount) as dailyEvent, avg(dailyDisk) as dailyBDiskAvg, avg(dailyRaw) as dailyBRawAvg, avg(dailyEventCount) as dailyEventAvg, dc(bucketId) as countBucket by index, state, ageDays
    | eval bPerEvent=round(dailyBDiskBucket/dailyEvent)
    | eval bPerEventRaw=round(dailyBRaw/dailyEvent)
| table dailyBDiskBucket index ageDays dailyEvent bPerEvent dailyBRaw bPerEventRaw state
    | sort ageDays
| stats sum(dailyBDiskBucket) as Vol_totDBSize, avg(dailyBDiskBucket) as Vol_avgDailyIndexed, max(dailyBDiskBucket) as Vol_largestVolBucket, avg(dailyEvent) as avgEventsPerDay, avg(bPerEvent) as Vol_avgVolPerEvent, avg(dailyBRaw) as Vol_avgDailyRawVol, avg(bPerEventRaw) as Vol_avgVolPerRawEvent, range(ageDays) as rangeAge by index, state
    | foreach Vol_* [eval <<FIELD>>=if(<<FIELD>> >= pow(1024,3), tostring(round(<<FIELD>>/pow(1024,3),3))+ " GB", if(<<FIELD>> >= pow(1024,2), tostring(round(<<FIELD>>/pow(1024,2),3))+ " MB", if(<<FIELD>> >= pow(1024,1), tostring(round(<<FIELD>>/pow(1024,2),3))+ " KB", tostring(round(<<FIELD>>)) + " bytes")))]
    | rename Vol_* as *
    | eval comb="Index Avg/day: " + avgDailyIndexed + "," + "Raw Avg/day: " + avgDailyRawVol + "," + "DB Size: " + totDBSize + "," + "Per Event Avg/Vol: " + avgVolPerEvent + "," + "Retention Range: " + tostring(round(rangeAge))
    | eval comb = split(comb,",")
| xyseries index state comb
| table index hot warm cold 

This search helped a lot in knowing where to move forward in configuration changes. Hopefully this helps you avoid the trip into wonderland. The main are you want to look is Index Avg/day that's the compressed value; what is written to disk.

1 Solution

EdgarAllenProse
Path Finder

Oh! Also, make sure the search is run all time and if you are running an indexer cluster run the query on the master, otherwise run it on the indexer.

View solution in original post

EdgarAllenProse
Path Finder

Oh! Also, make sure the search is run all time and if you are running an indexer cluster run the query on the master, otherwise run it on the indexer.

Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...