Getting Data In

Why is my Universal Forwarder showing extreme lag or latency when sending Windows event log data?

hsrawat
Explorer

A Windows 2008R2 Universal Forwarder and Indexer are located in different geographical location. Events are hours behind.

There's no limit on outgoing forwarder throughput.

Clearing the Windows Security log allowed the events to catch-up for a short while, but they quickly fell behind again.

1 Solution

hsrawat
Explorer

Issue caused by small windows events which are smaller than MSS. Nagle's algorithm on UF and tcp delayed ack on Indexer will cause significantly reduced thruput. It get's worst on WAN compared to LAN. Default 8k DefaultSendWindow size becoming bottleneck.

Setting DefaultSendWindow windows registry to reasonably higher value will bring events getting indexed real time.
How to calculate appropriate value for DefaultSendWindow ?

Send buffer size = Desired Throughput * latency

More info
https://www.switch.ch/network/tools/tcp_throughput/?do+new+calculation=do+new+calculation (use bottom calculator)
http://www.speedguide.net/faq/what-is-the-bandwidth-delay-product-185
http://www.kehlet.cx/articles/99.html
http://web.archive.org/web/20080803082218/http://dast.nlanr.net/Guides/GettingStarted/TCP_window_siz...
https://www.switch.ch/network/tools/tcp_throughput/

How to set windows registry ?
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Services
\Afd
\Parameters
DefaultReceiveWindow
Value Type: REG_DWORD*

https://technet.microsoft.com/en-us/library/cc781532%28v=ws.10%29.aspx

Need to restart windows box after setting the value.

View solution in original post

hrawat_splunk
Splunk Employee
Splunk Employee

tcpSendBufSz config option is available in outputs.conf which is preferred way to fix this issue instead of setting DefaultSendWindow registry.

jenipherc
Splunk Employee
Splunk Employee

One of the easiest ways to identify that you have this problem is by looking at the max_age of those Windows events in Splunk's metrics.log.

index=_internal host=<forwarder> source=*metrics.log* group=per_sourcetype_thruput |timechart avg(max_age) by series useother=f

If you find all wineventlog:* have large avg(max_age) on your forwarder, you have adjusted the following on forwarder:

-evt_resolve_ad_obj is set to 0 in inputs.conf
-maxKbps is set to 0 in limits.conf

and there is still a lag. Consider update tcpSendBufSz suggested below.

mkolkebeck
Path Finder

It should also be noted that if indexer acknowledgement is enabled on the forwarder, you may also observe a historically large number of these events as a result of the high latency: "TcpOutProc - Read operation timed out expecting ACK from xxx.xxx.xxx.xxx:xxxx in 300 seconds" and "TcpOutProc - Possible duplication of events with channel=...", and therefore further complicating the Universal Forwarder falling behind on events.

0 Karma

hsrawat
Explorer

Issue caused by small windows events which are smaller than MSS. Nagle's algorithm on UF and tcp delayed ack on Indexer will cause significantly reduced thruput. It get's worst on WAN compared to LAN. Default 8k DefaultSendWindow size becoming bottleneck.

Setting DefaultSendWindow windows registry to reasonably higher value will bring events getting indexed real time.
How to calculate appropriate value for DefaultSendWindow ?

Send buffer size = Desired Throughput * latency

More info
https://www.switch.ch/network/tools/tcp_throughput/?do+new+calculation=do+new+calculation (use bottom calculator)
http://www.speedguide.net/faq/what-is-the-bandwidth-delay-product-185
http://www.kehlet.cx/articles/99.html
http://web.archive.org/web/20080803082218/http://dast.nlanr.net/Guides/GettingStarted/TCP_window_siz...
https://www.switch.ch/network/tools/tcp_throughput/

How to set windows registry ?
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Services
\Afd
\Parameters
DefaultReceiveWindow
Value Type: REG_DWORD*

https://technet.microsoft.com/en-us/library/cc781532%28v=ws.10%29.aspx

Need to restart windows box after setting the value.

dfronck
Communicator

I downvoted this post because use tcpsendbufsz so you don't have to edit the registry. tcpsendbufsz was introduced after the initial answer.

0 Karma

dfronck
Communicator

Instead of changing the registry, could we just add this setting to the UF config on the servers?

tcpSendBufSz = 16384

  • TCP send buffer size in bytes.
  • Useful to improve thruput with small size events like windows events.
  • Only set this value if you are a TCP/IP expert.
  • Defaults to system default.

hrawat_splunk
Splunk Employee
Splunk Employee

tcpSendBufSz was introduced after this post. So yes setting tcpSendBufSz is same as changing registry value.
This config sets SO_SNDBUF for setsockopt on the forwarder side only.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...