Solved: Search query to get amount of compressed data hitt...

ansif · ‎11-28-2017

Hello,

Is there any serach query that return amount of compressed data hitting to the indexer before it get uncompressed and indexed?

(Compression has been enabled from Heavy Forwarder to Indexer using compression = true )

nickhills · ‎12-05-2017

If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.

If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.

The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.

If my comment helps, please give it a thumbs up!

View solution in original post

nickhills · ‎12-05-2017

If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.

If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.

The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.

If my comment helps, please give it a thumbs up!

ansif · ‎12-05-2017

@nickhillscpl : I tried the below search:

index="_internal" host="*<Sending_HF>*" source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connections | timechart span=1d sum(kb) AS dailyGB | eval dailyGB=round(dailyGB/(1024*1024),3)

and I got 5.334 GB on a day, tried below SPL:

index=_internal host="<Licence_Master>"  type="RolloverSummary" earliest=-30d@d   | eval _time=_time - 43200 | bin _time span=1d | stats latest(b) AS b by slave, pool, _time | timechart span=1d sum(b) AS "volume" fixedrange=false | join type=outer _time [search index=_internal [`set_local_host`] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d | eval _time=_time - 43200 | bin _time span=1d | stats latest(stacksz) AS "stack size" by _time] | fields - _timediff  | foreach * [eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]

and I got 10.136GB on same day.

Does it mean 10.136 data is compressed only to 5.334?

2:1 compression ratio?

nickhills · ‎12-05-2017

Well - just to state it again, metrics.logs is sampled values - it only looks at the data every 30 seconds or so and estimates the volumes. Licence_usage.log is gospel truth about how much (uncompressed) data was written to the indexes.

So your first query is a bit of guesswork from Splunk. The second query is hard fact.
2:1 is quite plausible for some log types, but you may find the suggested compression rate falls if you run it over a longer time period.
With that said - its probably as good an estimate as you can get (unless you compare with the hard facts from Stream as per your other question)

If my comment helps, please give it a thumbs up!

Search query to get amount of compressed data hitting in the Indexer.

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life