Getting Data In

How to troubleshoot error "forwarding to indexer group default-autolb group blocked for N seconds"?

msantich
Path Finder

hello

We have a Linux server running Splunk forwarder which forwards to one of two heavy forwarders in an autolb configuration.
The Splunk forwarder reports that it connects to the heavy forwarder, but I get a message in splunkd.log that says

forwarding to indexer group default-autolb group blocked for <nnnnn> seconds. 

From the point of view of the deployment monitor running on the indexer, the Splunk forwarder in question is "missing".

Please help us diagnose our problem as we have a demo to a customer tomorrow.

thank you

0 Karma
1 Solution

msantich
Path Finder

Jkat54 - thanks for your response - here is some more data

I'm seeing the light forwarders connecting on/off to the heavy fwds, but the connections keep dropping

On light forwards, I'm getting errors like :

read operation timed out expecting ack form

Possible duplication of events with channel=source ...offset = on host Raw connection to timed out Forwarding blocked...
Applying quarantine to
Removing quarantine from

On heavy fwds, I get erros like :
Forwarding to blocked

From the point of view of the deployment monitor, all the light fowrders in the system keep toggling between active and missing....
if on the light forwarders I do: ./splunk list forward-server, I do not get consistent results...

we're using ssl...netstat reports connections on port 8081 (used from light fwds to heavyfwds) and 8082 (heavy fwds to indexer)

Thanks..

Michael.

0 Karma

msantich
Path Finder

We can close this. Of the many servers (splunk light forwarders) that were failing to report, I rebooted one of the ones that was reporting all the forwarding blocked error messages. Within 2 minutes the other servers began reporting in and within 15 minutes, all 34 servers in the domain had successfully reported and forwarded a days' worth of data to the heavy forwarders.

Though the issue is fixed, I'd like to know if there is something that we did or something in our config to cause this to happen. Is there a tuning param set too tight, for example.

thank again to Jkat54

Thanks for any feedback you can give here.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Nothing strikes me as being 'the problem'. Believe it or not, restarting To fix the problem works fairly often.

In your case i would set up an alert to monitor your _internal index and alert if the condition occurs again. At least you know the fix next time it happens. If it did continue to happen I would continue digging with a support ticket, etc.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...