Hello,
Is there any serach query that return amount of compressed data hitting to the indexer before it get uncompressed and indexed?
(Compression has been enabled from Heavy Forwarder to Indexer using compression = true )
If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput
you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.
If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection
you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.
The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.
If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput
you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.
If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection
you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.
The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.
@nickhillscpl : I tried the below search:
index="_internal" host="*<Sending_HF>*" source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connections | timechart span=1d sum(kb) AS dailyGB | eval dailyGB=round(dailyGB/(1024*1024),3)
and I got 5.334 GB on a day, tried below SPL:
index=_internal host="<Licence_Master>" type="RolloverSummary" earliest=-30d@d | eval _time=_time - 43200 | bin _time span=1d | stats latest(b) AS b by slave, pool, _time | timechart span=1d sum(b) AS "volume" fixedrange=false | join type=outer _time [search index=_internal [`set_local_host`] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d | eval _time=_time - 43200 | bin _time span=1d | stats latest(stacksz) AS "stack size" by _time] | fields - _timediff | foreach * [eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]
and I got 10.136GB on same day.
Does it mean 10.136 data is compressed only to 5.334?
2:1 compression ratio?
Well - just to state it again, metrics.logs
is sampled values - it only looks at the data every 30 seconds or so and estimates the volumes. Licence_usage.log
is gospel truth about how much (uncompressed) data was written to the indexes.
So your first query is a bit of guesswork from Splunk. The second query is hard fact.
2:1 is quite plausible for some log types, but you may find the suggested compression rate falls if you run it over a longer time period.
With that said - its probably as good an estimate as you can get (unless you compare with the hard facts from Stream as per your other question)