Splunk Search

Search query to get amount of compressed data hitting in the Indexer.

ansif
Motivator

Hello,

Is there any serach query that return amount of compressed data hitting to the indexer before it get uncompressed and indexed?

(Compression has been enabled from Heavy Forwarder to Indexer using compression = true )

Tags (1)
1 Solution

nickhills
Ultra Champion

If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.

If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.

The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.

If my comment helps, please give it a thumbs up!

View solution in original post

0 Karma

nickhills
Ultra Champion

If you look in index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=thruput you will see sampled data volumes for each of your forwarding connections. This will show you how much data was sent (and indexed) - i.e the raw data size.

If you look at index=_internal metrics source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connection you will see a sampling of the data volume transmitted - if you have compression turned on, this will report the compressed size.

The important thing to note is that this is sampled data - it is designed to give you an indication of the volumes of data being processed at various stages of the pipeline, but it does not give you any kind of guarantee abut total volumes etc.

If my comment helps, please give it a thumbs up!
0 Karma

ansif
Motivator

@nickhillscpl : I tried the below search:

index="_internal" host="*<Sending_HF>*" source="/opt/splunk/var/log/splunk/metrics.log" group=tcpout_connections | timechart span=1d sum(kb) AS dailyGB | eval dailyGB=round(dailyGB/(1024*1024),3)

and I got 5.334 GB on a day, tried below SPL:

index=_internal host="<Licence_Master>"  type="RolloverSummary" earliest=-30d@d   | eval _time=_time - 43200 | bin _time span=1d | stats latest(b) AS b by slave, pool, _time | timechart span=1d sum(b) AS "volume" fixedrange=false | join type=outer _time [search index=_internal [`set_local_host`] source=*license_usage.log* type="RolloverSummary" earliest=-30d@d | eval _time=_time - 43200 | bin _time span=1d | stats latest(stacksz) AS "stack size" by _time] | fields - _timediff  | foreach * [eval <<FIELD>>=round('<<FIELD>>'/1024/1024/1024, 3)]

and I got 10.136GB on same day.

Does it mean 10.136 data is compressed only to 5.334?

2:1 compression ratio?

0 Karma

nickhills
Ultra Champion

Well - just to state it again, metrics.logs is sampled values - it only looks at the data every 30 seconds or so and estimates the volumes. Licence_usage.log is gospel truth about how much (uncompressed) data was written to the indexes.

So your first query is a bit of guesswork from Splunk. The second query is hard fact.
2:1 is quite plausible for some log types, but you may find the suggested compression rate falls if you run it over a longer time period.
With that said - its probably as good an estimate as you can get (unless you compare with the hard facts from Stream as per your other question)

If my comment helps, please give it a thumbs up!
0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...