Getting Data In

Why did our Splunk Windows server suddenly start disappearing off our network with, Event 4227, TCP/IP stack 'crash'?

rscwebops
Engager

Hi,

Our local Splunk server has been working fine for months, but suddenly it started momentarily 'disappearing' off the network.

When we checked the Event log we found the following:

Log Name:      System
Source:        Tcpip
Date:          18/03/2015 04:34:50
Event ID:      4227
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      XXXXXXX.XXX-XX.XXX
Description:
TCP/IP failed to establish an outgoing connection because the selected local endpoint was recently used to connect to the same remote endpoint. This error typically occurs when outgoing connections are opened and closed at a high rate, causing all available local ports to be used and forcing TCP/IP to reuse a local port for an outgoing connection. To minimize the risk of data corruption, the TCP/IP standard requires a minimum time period to elapse between successive connections from a given local endpoint to a given remote endpoint.

Event Xml:

<Provider Name="Tcpip" />
<EventID Qualifiers="32768">4227</EventID>
<Level>3</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2015-03-18T04:34:50.945Z" />
<EventRecordID>259309</EventRecordID>
<Channel>System</Channel>
<Computer>XXXXXXX.XXX-XX.XXX</Computer>
<Security />


<Data>
</Data>
<Binary>00000000010000000000000083100080000000000000000000000000000000000000000000000000</Binary>

Restarting the server has fixed it, but does anyone know why this might have happened and, more importantly, what could be done to prevent it from happening again?

Regards

Mark
ICT
Royal Society of Chemistry
Cambridge, UK

0 Karma

geoppspl7
New Member

explore decreasing TcpTimedWaitDelay and increasing MaxUserPort: REG_DWORD named TcpTimedWaitDelay should be set for the value of 30 seconds, as opposed to default value being 4 minutes & MaxUserPort to 65534
located here HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters

also modify the dynamic port range using
netsh int ipv4 set dynamicport tcp start=10000 num=50000 to expand dynamic port range

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

Something to consider:
https://support.microsoft.com/en-us/kb/2901197

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

rscwebops
Engager

Thanks for that, we've taken a look and decided to install the hotfix due to the nature of the error and the fact that this is a production machine.

Thankfully it's a VM so it was snapshotted before hand and we can roll back in a day or two if the hotfix doesn't help.

Mark

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...