Extreme latency with indexed events

rbal_splunk · ‎05-03-2014

With about 200 Heavy Forwarder sending data to four indexers.
All Heavy Forwarders are Splunk Version 5.0.2 and Indexer’s on 6.0.
For some of the Heavy Forwarder’s, consistently queues are blocked causing extreme latency in forwarding data to indexer, The tcpout queue remains blocked, result in blocking of other queues. The indexers have no backlog at the time.
The outgoing throughput from HF is around 512KB/s .
Verified that through se to unlimited

limits.conf -----
[thruput]
maxKBps = 0

Validated Key attributed from outputs.conf:
maxQueueSize = 7MB
[tcpout:out]
autoLBFrequency = 30
compressed = true
server = xx:22601,yy:22601,zz.com:22601,aa:22601
useACK = true

Measuring bandwidth capacity for splunk2splunk communication between heavy Forwarder with/without Latency to indexer.Test was performed using as per http://twiki.splunk.com:9000/twiki/bin/view/Main/MeasuringBandwidth.The transfer rate variation 20667.71 KBs to 8319 KB/s was noted

rbal_splunk · ‎05-03-2014

As far as solution goes, the simple recommendation will be to convert Heavy Forwarder to Universal Forwarder, as through put on the Universal Forwarder is much better.

In this case number of full forwarders seem to be around ~200. It is possible due to nature of handing of fully cooked splunk-to-splunk data, i.e receiving data and parsing s2s cooked data in same thread, is leading to slow thruput numbers. Imbalance of 200HWF-->4 IDX is likely causing slowdown. Typically 200UF ---> 4 IDX won't be an issue.

In case transition from Heavy forwarder to Universal Forwarder is not an option, We would think that thruput numbers will be improved by adding additional indexers instances on the same physical box(assuming that existing instances are barely doing any work). This will just allow more indexers to process splunk-to-splunk data(unfortunately there is just one thread doing splunk-to-splunk deserialization work).

In addition, recommendation will be to upgrade all the Heavy Forwader to be on version 5.0.4 and above as some of the critical bugs were fixed post 5.0.4.

Extreme latency with indexed events

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms