Splunk Search

compression rate of indexed data: 50gig/day in 3 weeks uses 100gig HDD space

jan_wohlers
Path Finder

Hey,

we just set up a indexer 3 weeks ago. By now we are indexing about 50gig/24h. If I go to Manager -> Indexes I can see that our main index only has a size of about 100gigs. Mostly just eventlogs will be indexed. Is there such a good compression that about 20 days of 50gigs/day will be stored in an 100gig index?

Thanks for your answer in advance!

Jan

Tags (1)
1 Solution

MuS
Legend

Hi jan.wohlers

basically you can say compression between 40-50% are normal, you can check this with this search:

| dbinspect index=_internal
| fields state,id,rawSize,sizeOnDiskMB 
| stats sum(rawSize) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB
| eval rawTotalinMB=(rawTotal / 1024 / 1024) | fields - rawTotal
| eval compression=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"
| table rawTotalinMB, diskTotalinMB, compression

cheers,

MuS

View solution in original post

MuS
Legend

Hi jan.wohlers

basically you can say compression between 40-50% are normal, you can check this with this search:

| dbinspect index=_internal
| fields state,id,rawSize,sizeOnDiskMB 
| stats sum(rawSize) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB
| eval rawTotalinMB=(rawTotal / 1024 / 1024) | fields - rawTotal
| eval compression=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"
| table rawTotalinMB, diskTotalinMB, compression

cheers,

MuS

jan_wohlers
Path Finder

Okay, thanks for the answer. Compressionrate is 21% which seems pretty good.

0 Karma

ibondarets
Explorer

Thank you for this handy search example!
I've couple of questions regarding it:
1) how can i build a search which will give me a table of all indexes present wih compression ratio information? I tried this:

| dbinspect index=*
  | fields state,id,rawSize,sizeOnDiskMB 
  | stats sum(rawSize) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB by index
  | eval rawTotalinMB=(rawTotal / 1024 / 1024) | fields - rawTotal
5.  | eval compression=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"
  | table rawTotalinMB, diskTotalinMB, compression

but it didn't work.
2) what does it mean when I get this:
alt text

0 Karma

ConnorG
Path Finder

I have a similar compression percentage as ibondarets.

Guess that means our data is actually larger once indexed.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...