Getting Data In

Event latency due to Intermittent forwarder & indexer timed out issues

splunkIT
Splunk Employee
Splunk Employee

Our site has 20+ indexers and 5000+ forwarders. The indexers and forwarders are on splunk 6.0.x.

We are suspecting that TcpInputProc on indexer might have been stalled in mid-flight, and would eventually recovered; forwarding would resume again. However, due this issue, we have been experiencing latency of events being sent to indexer. We have also confirmed that all the indexers were in "healthy" state. No signs of index congestion or high memory or cpu utilization.

splunkd.log of indexers report:

ERROR TcpInputProc - Error encountered for connection from src=10.75.0.50:55302. Timeout Timed out after 600 seconds.

splunkd.log of forwarders report any number of following messaages:


ERROR TcpOutputFd - Connection to host=10.160.31.15:9997 failed
WARN TcpOutputFd - Connect to 10.160.31.15:9997 failed. Connection refused
WARN TcpOutputProc - Cooked connection to ip=10.160.31.15:9997 timed out

But this
WARN  TcpOutputProc - Cooked connection to ip=10.160.31.15:9997 timed out
seems to be more frequently reported than the other messages.

splunkIT
Splunk Employee
Splunk Employee

The symptoms described above pertains to a known bug: SPL-84550: [splunktcp://] input does not inherit default setting "connection_host = ip" from [splunktcpin] stanza, leads to intermittent forwarder connection timeouts.

The root cause seems to be that splunk indexer (specifically the splunktcpin acceptor thread) is stuck in performing reverse DNS resolution for a handful of forwarders, and refusing new forwarder connection requests, or allowing some forwarder connections to time out (default forwarder connection time out is 20 sec).

Workaround solution is to set connection_host = ip in indexers' inputs.conf. Example:


[splunktcp://9997]
connection_host = ip

Ellen
Splunk Employee
Splunk Employee

SPL-84550 is expected to be fixed in an upcoming maintenance release beyond 6.0.5

SPL-85899 is the 6.1.x version of the bug and fixed in the currently available 6.1.3 maintenance release.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...