Splunk Search

How to show the total size of events from a source?

a212830
Champion

Hi,

I need to show a customer that Splunk is processing their entire file, and thought a good way of doing it was to calculate the total size of events from particular sources and then comparing it to the logfile itself. Is this possible? If so, how?

Labels (1)
Tags (2)

woodcock
Esteemed Legend

You should pick the best answer that got you to a solution and click Accept to close the question.

0 Karma

deepak_acalvio
Explorer

You can use license_usage.log file as suggested by SloshBurch.

here is the query:
index=_internal source="*license_usage.log*" type=Usage | stats sum(eval(b/1024/1024)) AS volume_b by s

This will give you size of each source in MBs.

joesrepsol
Path Finder

GREAT query. Using this one now and very helpful. Thanks so much!

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Don't forget the license_usage.log file. Assuming there is no congestion, the license_usage.log file would show for any source (s), sourcetype (st), index (i), or host (h), the bytes (b) of that event. Therefore you could add up (sum) the total bytes per that file to show the true size. Or the roll over events each night will show a summary statistic of the same.

If there is no value for those fields then you may be on an old version of splunk OR there was index congestion.

0 Karma

fredclown
Contributor

The question is about source so unfortunately in most environments the usage.log will not be accurate. If you have a small Splunk environment it will probably work, but Splunk squashes the values of source and host to keep the event counts down for the usage.log file. It doesn't squash index or sourcetype so those would be accurate but if you are trying to use host or source and you have an environment that is not small, most likely this will be less accurate than summing up the lengths of all the _raw data.

0 Karma

woodcock
Esteemed Legend

If you are suing the default LINE_BREAKER which means each line is a single event then you can count lines. If you are sending all of the data (not diverting any to nullQueue) then you can count bytes.

Both like this:

index=* source=MyFile | eval bytes=len(_raw) | stats count AS Lines sum(bytes) AS Bytes by source

sloshburch
Splunk Employee
Splunk Employee

I used to do it this way but recently learned that this won't be 100% accurate because

  1. it assumes that len and the license counter measure the same (they don't, len measures characters while the license counter measures bytes)
  2. it assumes that there's no delay or lag in indexing. _indextime is not the same as _time. Sometimes forwarders get backed up and an item may be indexed some time after what it's _time value is.

woodcock
Esteemed Legend

These are both excellent points and my answer was very US-centric and not fully qualified.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If comparing byte or character counts be aware that Splunk does not index LINE_BREAKER characters ([\r\n], by default) so allow for that in your comparison.
I would probably compare the event count in Splunk to the number of lines in the log file, assuming a 1:1 ratio. This may not work if you merge multiple lines into a single event or split lines into multiple events.

---
If this reply helps you, Karma would be appreciated.

woodcock
Esteemed Legend

This is another VERY excellent point.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...