Getting Data In

Why an _internal index search on per_index_thruput shows hosts are forwarding less data than is being indexed?

hartfoml
Motivator

I have this search

index=_internal source="*metrics*" group="per_index_thruput" series="customindex" host="*MyIndexers*"

this gives me the sum(kb) for each "customindex" that is recorded on each indexers metrics.log

When I do this

index=_internal source="*metrics*" group="per_index_thruput" series="customindex" host!="*MyIndexers*"

this gives me the sum(kb) for each "customindex" that is recorded on each hosts metrics.log

Since the host is sending and the indexers are receiving how is it that the hosts are sending less than what is being received?

yannK
Splunk Employee
Splunk Employee

Because the metrics.log shows a pooling of the top 10 values every 30 seconds.
Therefore , if you have more than 10 hosts, you will only see the first 10 in the metrics.
see the remarks here : http://docs.splunk.com/Documentation/Splunk/6.1.4/Troubleshooting/Aboutmetricslog#Thruput_messages

If you want more precise indexed volume, you can look at the
index=_internal source=*license_usage.log* type=Usage | stats sum(b) by idx
see http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

0 Karma

hartfoml
Motivator

Thanks @yannK Not yet on 6.1 I will be upgrading soon so I don't have the idx value in the license_usage.log

I'm looking through the 6.1.4 troubleshooting to see if this can help me with my 4.3 version until the upgrade planed in three weeks

Thanks for the help 🙂

0 Karma

athorat
Communicator

@yannK
When I use license_usage.log I get half the amount of volume count but when I use source="*metrics.log"
I get the twice the amount of volume compared to that of license_usage.log

When I use
index="_internal" source=license_usage.log type=Usage | eval b=b/(1024*1024) |timechart span=d sum(b)
I get 49 GB for a specific Day
AND
When i use
index="_internal" source="*metrics.log" per_index_thruput | eval GB=kb/(1024*1024) |timechart span=d sum(GB)
I get 98GB for that same day.

So as I understand metrics.log will only return top 10 values every second and will not give precise data?
But seems to be otherwise.

0 Karma

yannK
Splunk Employee
Splunk Employee

You may be are comparing Apples and Oranges.

  • metrics.logs gives a rough estimate of the data ingested per index. 
  • license_usage.log gives the reported volume of data counted on the license. (excluding internal indexes, summary sourcetypes etc...). Also the licensing timestamp is "when the license-master received the usage report from the indexers" (it may be backlog if the Indexers were not able to connect)

 

A side remark : when you look at logs, always add a * at the end of the source, in case the logs rotated, and some event are reported under the rotated log source. (source=license_usage.log* can find data license_usage.log.1 )

 

0 Karma

hartfoml
Motivator

Thanks @ppablo_splunk for the edits to make the title more understandable. I appreciate your help.

ppablo
Retired

No problem @hartfoml 🙂 I appreciate the thanks!

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...