Solved: Major discrepancy between per_sourcetype_thruput a...

Jason · ‎04-11-2012

I'm looking at a Splunk instance right now that is getting 99+% of its data as one particular sourcetype, from two heavy forwarders.

Running a search on index="_internal" source="*metrics.log" per_sourcetype_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by series over 7 days gives a peak of 188GB/day on Thursday.

But, a search for index="_internal" source=*metrics.log group=tcpin_connections | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by sourceHost stacked over the same period shows a similar curve but a peak of almost 500GB/day on Thursday!

What is going on here?

Which metric is correct?

Is the heavy forwarder really adding an additional 150% to the amount of bandwidth used? (Regardless of what actually gets indexed)

jbsplunk · ‎04-11-2012

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

View solution in original post

Jason · ‎04-11-2012

Nope, they're about the same as per_sourcetype_thruput. I didn't mention before, but it may be helpful, that this indexer is only receiving data from heavy forwarders. (edited op)

hexx · ‎04-11-2012

What numbers do you see for index=_internal source=*metrics.log per_index_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB)? Are they closer to what you find from tcpin_connections?

jbsplunk · ‎04-11-2012

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

jbsplunk · ‎04-11-2012

That may or may not be true. It is completely feasible that the aggregate size of the sourcetypes which aren't in the top 10 would cause significant differences between these measurements. I think per_index_thruput is probably a better measurement if you've got under 10 indexes configured.

Jason · ‎04-11-2012

I figured the maxseries would not be an issue, since the test data sourcetype is constantly orders of magnitude higher than any other type of data coming in to the indexer.

Major discrepancy between per_sourcetype_thruput and tcpin_connections - which is right?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!