winsock error 10054

erga00 · ‎02-23-2011

I'm getting thousands of instances of the following error on my indexers.

02-16-2011 02:20:02.921 ERROR TcpInputProc - Error encountered for connection from host=ftlpprint01, ip=10.13.98.68. Winsock error 10054

According to MSDN error 10054 means connection was forcibly closed by the remote host which in my case would be a forwarder or LWF.

We have 6 indexers all running Splunk 4.1.6 on Windows 2008 R2 x64. There are ~150 LWF and 6 regular forwarders all running Windows & Splunk 4.1.6 and all of which send to the indexers using AutoLB. All generate that alert to some extent. The odd thing is that about 20% of the forwarders are connecting over the WAN but the the majority of the messages are for connections with a forwarder in the same site as the indexers.

Is this normal (doesn't seem like it) and if not how would you suggest I start troubleshooting?

jrodman · ‎02-23-2011

Some day, I should sit down and actually read TCP/IP Illustrated. I'm only familiar with 'connection reset by peer' by bumping into it. Apparently, it means the other end of the socket is not aware of the connection. This could mean something where the other side shut down the link abruptly and the normal shutdown didn't happen for some reason, and thus your side finds out about it late, when still trying to use the connection. But more likely some network equipment or firewall is deciding it no longer wishes to remember or keep up this socket.

You should defintely begin by looking at how often you see these messages for a given ip, for a given indexer. What are the timing patterns? It might be useful to see what the forwarder side reports at the same time window.

Probably this becomes more of a tech support issue, but it is possible that it's really just a network configuration issue in your environment which is hard for the application to analyze. I think pursuing both is the right strategy

erga00 · ‎02-24-2011

I've done some analysis of timing, pattern, etc of the errors and found nothing that stands out other than it's related to the volume of data being forwarded. I'd have expected forwarders relaying via a WAN link to have more errors but that's not the case. The worst offender is on the same LAN segment as the indexers.

I'm going to start with the Networking team and see what if any network devices are between the indexers and forwarders.

Thanks for the insight.

winsock error 10054

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!