Splunk Cloud Platform

All Heavy Forwarders have this error: TCPOutAutoLB-0 - More than 70% forwarding destinations have failed.

Ichan
Loves-to-Learn Everything

We have Prod and Non Prod environments, about 2 weeks ago we started to get this issue on our Non Prod environment. I have compared my outputs.conf files for all my HF forwarders and have found no issues with it, telnet to the indexers from the HFs, certificates are in order.  I have also compared the Non Prod output.conf to my PROD outputs.conf files. After searching online I have not found anything about this.  Has anyone come across it before? Any help would be appreciated!

Ichan_1-1655879579200.png

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

You should have MC up and running for your environment? If not please set it up.

With MC you can check from your indexers what is situation about input (and other) queues. Anyhow usually you will get little bit different warning if those are blocked.

Have you check from OS side that there is no issues on interfaces etc (bad cables/switches/ports etc)? Or have there been any changes on your infrastructure including network gears?

r. Ismo

Ichan
Loves-to-Learn Everything

Hi R. Ismo,

Thank you for your suggestions. I looked in the CMC (SplunkCloud) and it shows no issues with the indexers. There was a recent network issue which was resolved a couple of days ago, but this TCPOutAutoLB-0 error happened at least a week before that.

I had submitted a ticket to Splunk for assistance and they gave me the link below. I'm not sure how it pertains to my issue as I am able to telnet to the indexers from all the HFs.

https://www.cyberciti.biz/faq/howto-rhel-linux-open-port-using-iptables/

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Sounds like that AutoLB issue was 1st notice of your network issue?

If you can send some events from all of your gateway forwarders / UFs then I don't think that local FW (iptables) is the issue. 

Here is one conf presentation which you probably could use to check if there is local issue or where the issue could be.

https://conf.splunk.com/files/2019/slides/FN1570.pdf

 

Ichan
Loves-to-Learn Everything

Hi R. Ismo,

That document link really helped (thank you), at first I only checked on Splunk cloud hence it was fine. I went through all 4 HF and found that only 1 had the issue (image below).  I'm still learning so from the document, is it pointing to a parsing issue?

Ichan_0-1656050171042.png

 

Tags (1)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Nope. You should read that picture from right to left. Usually 1st which has 100% or other really high % is that what has really the issue / bottleneck. You could look from docs what all actions are done on typing pipeline. Here is link to one old answer which you could also use https://community.splunk.com/t5/Getting-Data-In/Diagrams-of-how-indexing-works-in-the-Splunk-platfor...

If the indexing queue has 100% utilization, then you must start checking from node where that node is sending it’s events.

regards Ismo

Ichan
Loves-to-Learn Everything

Thank you. I will go through it.

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...