Getting Data In

Major discrepancy between per_sourcetype_thruput and tcpin_connections - which is right?

Jason
Motivator

I'm looking at a Splunk instance right now that is getting 99+% of its data as one particular sourcetype, from two heavy forwarders.

Running a search on index="_internal" source="*metrics.log" per_sourcetype_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by series over 7 days gives a peak of 188GB/day on Thursday.

But, a search for index="_internal" source=*metrics.log group=tcpin_connections | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by sourceHost stacked over the same period shows a similar curve but a peak of almost 500GB/day on Thursday!

What is going on here?

Which metric is correct?

Is the heavy forwarder really adding an additional 150% to the amount of bandwidth used? (Regardless of what actually gets indexed)

Tags (2)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

View solution in original post

Jason
Motivator

Nope, they're about the same as per_sourcetype_thruput. I didn't mention before, but it may be helpful, that this indexer is only receiving data from heavy forwarders. (edited op)

0 Karma

hexx
Splunk Employee
Splunk Employee

What numbers do you see for index=_internal source=*metrics.log per_index_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB)? Are they closer to what you find from tcpin_connections?

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

jbsplunk
Splunk Employee
Splunk Employee

That may or may not be true. It is completely feasible that the aggregate size of the sourcetypes which aren't in the top 10 would cause significant differences between these measurements. I think per_index_thruput is probably a better measurement if you've got under 10 indexes configured.

0 Karma

Jason
Motivator

I figured the maxseries would not be an issue, since the test data sourcetype is constantly orders of magnitude higher than any other type of data coming in to the indexer.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...