Splunk Search

Search query to get amount of compressed data hitting in the Indexer.

ansif
Motivator

Hello,

Is there any serach query that return amount of compressed data hitting to the indexer before it get uncompressed and indexed?

(Compression has been enabled from Heavy Forwarder to Indexer using compression = true )

Tags (1)
1 Solution

nickhills
Ultra Champion

If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.

If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.

The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.

If my comment helps, please give it a thumbs up!

View solution in original post

0 Karma

nickhills
Ultra Champion

If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.

If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.

The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.

If my comment helps, please give it a thumbs up!
0 Karma

ansif
Motivator

@nickhillscpl : I tried the below search:

index="_internal" host="*<Sending_HF>*" source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connections | timechart span=1d sum(kb) AS dailyGB | eval dailyGB=round(dailyGB/(1024*1024),3)

and I got 5.334 GB on a day, tried below SPL:

index=_internal host="<Licence_Master>"  type="RolloverSummary" earliest=-30d@d   | eval _time=_time - 43200 | bin _time span=1d | stats latest(b) AS b by slave, pool, _time | timechart span=1d sum(b) AS "volume" fixedrange=false | join type=outer _time [search index=_internal [`set_local_host`] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d | eval _time=_time - 43200 | bin _time span=1d | stats latest(stacksz) AS "stack size" by _time] | fields - _timediff  | foreach * [eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]

and I got 10.136GB on same day.

Does it mean 10.136 data is compressed only to 5.334?

2:1 compression ratio?

0 Karma

nickhills
Ultra Champion

Well - just to state it again, metrics.logs is sampled values - it only looks at the data every 30 seconds or so and estimates the volumes. Licence_usage.log is gospel truth about how much (uncompressed) data was written to the indexes.

So your first query is a bit of guesswork from Splunk. The second query is hard fact.
2:1 is quite plausible for some log types, but you may find the suggested compression rate falls if you run it over a longer time period.
With that said - its probably as good an estimate as you can get (unless you compare with the hard facts from Stream as per your other question)

If my comment helps, please give it a thumbs up!
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...