Getting Data In

How to troubleshoot why one forwarder "could not send data to output queue" after upgrading to Splunk 6.2.2?

AaronMoorcroft
Communicator

Hey Guys

I have a right one here. So I have a bunch of systems in a DMZ forwarding to a heavy forwarder that then forwards onto out main LAN forwarder. All the systems in the DMZ are sending logs as you would expect except for one. At this point, I need to make clear that no firewall changes have been made, no network configs, absolutely nothing network related has been changed.

The systems have all been upgraded to 6.2.2 and, as explained, all are working except one. If I look in the deployment monitor tool, the problem machine is listed and looks to be working fine. If I try and search the logs. that is not the case; there is nothing. I can search the other system logs without issue. I have removed and reinstalled the forwarder multiple times now with no luck.

I am now investigating the logs which seem to be pointing to some sort of networking issue

**Connection to host ***.***.***.*** failed** and No connections could be made to the host because the machine actively refused it, and also could not send data to output queue (parsingqueue, retrying along with Connection cooked)

As I say all the other systems are working as expected and no changes have been made to the systems network / firewall settings. It seems to be some sort of installation corruption maybe? Has anyone got any ideas what I can do or where I can look to get more of an idea on how to get this resolved?

Edit:

Just an addition, not sure if it means anything, but in the Deployment monitor, the problem server is listed twice on the new and old version. Could there be some sort on conflict somehow on the problem server where its half installed or something daft?

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

It sounds like your downstream heavy forwarder (or further HF / indexers downstream) queues could be full / blocked which is why it is refusing additional connections, which would also explain the intermittent nature of the problem as congestion could come and go. Try running this search on each of the downstream splunk instances from the UF and see which queues are blocked.

index=_internal host=hosttoinvestigate source=*metrics.log group=queue blocked=true | timechart count limit=0 by name

0 Karma

AaronMoorcroft
Communicator

Hi

Afte running the search above it brings back a whole host of logs, from what I can tell most of which are simlar to below -

06-19-2015 23:59:24.428 +0100 INFO Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=663, largest_size=663, smallest_size=663

Is there a way to increase the maximum size ? or can you advise on a way to resolve this ?

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

So you are seeing blocked typing queues on your heavy forwarder or indexers? This is typically an indication of heavy regex replacement / field extraction / routing (via props & transforms). I would look in props & transforms.conf on these splunk instances and see if you can offload some of that processing of regex & field extractions to the search heads. Increasing the queue size isn't the right direction to go in for resolving this.

If there are other queues blocked the troubleshooting approach would be different. What is your output of this search:

index=_internal host=hosttoinvestigate source=*metrics.log group=queue blocked=true | stats count by name

0 Karma

AaronMoorcroft
Communicator

aggqueue 1068
indexqueue 1194
parsingqueue 278
splunktcpin 525
typingqueue 1350
udpin 80
wel_queue 102

0 Karma

AaronMoorcroft
Communicator

it seems to be working a little better this morning so im not sure if the above results look great or not

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

Can you indicate what device this is from and provide similar output for all downstream splunk instances from the UF? If the downstream nodes have blocked queues that would be the place to start troubleshooting.

0 Karma

AaronMoorcroft
Communicator

Can you advise on how you would go about increasing the maximum size ?

0 Karma

AaronMoorcroft
Communicator

Update 2 -

I have completely removed 6.2.2, erased anything Splunk related from the Registry and rebooted. I then reinstalled 5.0.2 which worked intermittantly, upgraded that to ver 6.2.2 which is now also working intermittantly.

All the errors are still showing in the SplunkD.log as they were before but the Forwarder is now spitting the odd few events through here and there, I still have no idea what the issue is and or how its half resolved itself.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...