How to prevent congestion between Heavy Forwarders...

vr2312 · ‎05-11-2017

We have observed yesterday that there was around 90+% of indexing queue on our indexers.

This resulted in failed connections between Heavy Forwarders (HF) and Indexers.

Once the indexing queue receded, data from HFs started flowing to indexers and data was then written to disks.

I have a few questions regarding this :

Our environment hosts Splunk IT Service Intelligence and Splunk Enterprise Security, which are both premium apps. Would the searches targeting the indexers also a cause due to which there were blocked queues?
What is the maximum TCP connections can an Indexer accept?
Any inputs on how to avoid such cases in the future?

woodcock · ‎05-11-2017

Due to MAJOR improvements in the S2S and the Universal Forwarder build, if you are on v6 (particularly later versions of v6), then you should only be using HFs for things like DBConnect. For things like syslog, you should DEFINITELY be using a Universal Forwarder. This is the answer to #3.

vr2312 · ‎05-11-2017

This is our infrastructure

Servers -> UF -> HF -> Indexers
Desktops -> UF -> HF -> Indexers
Syslog Servers -> HF -> Indexers
DBConnect HF -> Indexers

We are in version 6.4.4

woodcock · ‎05-11-2017

Your architecture is very v4 and is now an albatross around your bottleneck. In the updated v6 hotness it should be like this:
Servers -> UF -> Indexers
Desktops -> UF -> Indexers
Syslog Servers -> UF -> Indexers
DBConnect HF -> Indexers

The key on all the UFs is to set autoLB=true and also EVENT_BREAKER for every input to ensure proper balancing. Do not use external Load Balancers, either.

vr2312 · ‎05-11-2017

Thank you for your inputs @woodcock , is there any documentation where this is published, so that i can take a look, read through and proceed on making these major changes.

Looking by the response, you are asking me to remove the HF tier completely. Am i getting this right ?

AutoLB is true with Indexer ACK enabled.

woodcock · ‎05-11-2017

Keep HF for DBConnect only and yes, ditch the rest. The documentation about this evolution is not as clear as it should be but all of the testing that I have seen mirrors the PS scuttlebutt/buzz that I have been hearing about best practices having evolved to disclude HFs except in very (few) extreme circumstances. Here are a few places where there is some documentation:

https://www.splunk.com/blog/2014/03/18/time-based-load-balancing.html
http://docs.splunk.com/Documentation/Forwarder/6.6.0/Forwarder/Configureloadbalancing
https://docs.splunk.com/Documentation/Splunk/6.6.0/Admin/Outputsconf
forceTimebasedAutoLB = [true|false]
* Forces existing streams to switch to newly elected indexer every
AutoLB cycle.
* On universal forwarders, use the EVENT_BREAKER_ENABLE and
EVENT_BREAKER settings in props.conf rather than forceTimebasedAutoLB
for improved load balancing, line breaking, and distribution of events.

How to prevent congestion between Heavy Forwarders and Indexers?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!