Getting Data In

Major discrepancy between per_sourcetype_thruput and tcpin_connections - which is right?

Jason
Motivator

I'm looking at a Splunk instance right now that is getting 99+% of its data as one particular sourcetype, from two heavy forwarders.

Running a search on index="_internal" source="*metrics.log" per_sourcetype_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by series over 7 days gives a peak of 188GB/day on Thursday.

But, a search for index="_internal" source=*metrics.log group=tcpin_connections | eval GB=kb/1024/1024 | timechart span=1d sum(GB) by sourceHost stacked over the same period shows a similar curve but a peak of almost 500GB/day on Thursday!

What is going on here?

Which metric is correct?

Is the heavy forwarder really adding an additional 150% to the amount of bandwidth used? (Regardless of what actually gets indexed)

Tags (2)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

View solution in original post

Jason
Motivator

Nope, they're about the same as per_sourcetype_thruput. I didn't mention before, but it may be helpful, that this indexer is only receiving data from heavy forwarders. (edited op)

0 Karma

hexx
Splunk Employee
Splunk Employee

What numbers do you see for index=_internal source=*metrics.log per_index_thruput | eval GB=kb/1024/1024 | timechart span=1d sum(GB)? Are they closer to what you find from tcpin_connections?

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

They are both correct, but they are measuring different things. There is a discrepancy because per_sourcetype_thruput in metrics.log is based on an entry in limits.conf which defines the number of series to collect every 30 seconds, for which the default is 10. As such, if you've got more then 10 sourcetypes, you won't have the full picture on the total thruput of all sourcetypes. That doesn't mean it isn't useful, but you just don't have the full picture. You might have a better idea by looking at per_index_thruput, as most people probably don't have more than ten indexes. Anyway, thats the source of the discrepancy. Here is the setting from docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

[metrics]

maxseries = <integer>
 * The number of series to include in the per_x_thruput reports in metrics.log.
 * Defaults to 10.

Additionally, here is a page that is full of useful searches for troubleshooting data volume issues:

http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume

Specifically, I think you'll find the search for 'Counting event sizes over a time range' to be of use.

jbsplunk
Splunk Employee
Splunk Employee

That may or may not be true. It is completely feasible that the aggregate size of the sourcetypes which aren't in the top 10 would cause significant differences between these measurements. I think per_index_thruput is probably a better measurement if you've got under 10 indexes configured.

0 Karma

Jason
Motivator

I figured the maxseries would not be an issue, since the test data sourcetype is constantly orders of magnitude higher than any other type of data coming in to the indexer.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...