How to debug a stuck (parsing)queue

chris · ‎02-13-2013

Hi

I have a forwarder on AIX with vresion 4.3.3 that probably has a problem with its parsingqueue

I see the following in metrics.log:

02-13-2013 16:47:50.219 +0100 INFO  Metrics - group=queue, name=parsingqueue, max_size_kb=512, current_size_kb=449, current_size=9, largest_size=9, smallest_size=8
02-13-2013 16:48:21.226 +0100 INFO  Metrics - group=queue, name=parsingqueue, max_size_kb=512, current_size_kb=449, current_size=9, largest_size=9, smallest_size=9

splunkd.log contains a lot of :

02-13-2013 17:01:37.238 +0100 INFO  TailingProcessor -   ...continuing.
02-13-2013 17:01:42.241 +0100 INFO  TailingProcessor - Could not send data to output queue(parsingQueue), retrying...

Restarting splunk does not change the current_size_kb or current_size values so I tried to increase the queue size following this answer:
http://splunk-base.splunk.com/answers/38218/universal-forwarder-parsingqueue-kb-size

This leads to an increase of max_size_kb and current_size_kb but does not result in the forwarder sending anything to the indexer.

If current_size indicates how many events are in the queue the this number is relatively low.

Is there a way to debug what events are stuck in a queue?
Can I somehow manually force the forwarder to empty the queue and drop the events (I know, that this is ugly)?

Another strange thing is, that once in a while (every cupple of hours) the logs are suddenly indexed, but I did not find any hints in splunkd.log or metrics.log. There is an identical system with the same configuration that works fine. The indexer is not very busy it indexes about 30-40GB a day.

Thanks for your help,

Chris

makelovenotwar

I realize this is an old thread but in case anyone is running into this, this is how I solve it:

Do a running read of splunkd.log while searching for "while reading"

tail -f /opt/splunk/var/log/splunk/splunkd.log | grep -i "while reading"

Stop splunk and keep looking at the output of the tail command. Whichever file splunk was reading while it was shutdown, is your trouble file.

immortalraghava · ‎03-26-2018

Can I somehow manually force the
forwarder to empty the queue and drop
the events (I know, that this is
ugly)?

Did you find an answer for this. Thanks!

greich · ‎01-24-2019

anyone, purging queue on (intermediate) forwarders stuck at 100%, without reinstalling from scratch?

gjanders · ‎01-24-2019

Restart will by default clear the queues, if you have a specific question it may make sense to open a new Splunk answers post on it as this post is very old

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

MuS · ‎02-13-2013

Hi Chris, you just got an email 😉

yannK · ‎02-13-2013

If this is a forwarder, the problem is usually a step after :

check that your indexer queues
double check if the forwarder thruput limit is not simply blocking the outputs. by defaults forwarders have a limit of 256KBps see http://splunk-base.splunk.com/answers/29538/maxkbps-option-and-limiting-a-forwarders-rate-of-thruput

chris · ‎02-14-2013

Thanks for replying, the indexer queues (SOS) seem to be ok, the 256KBps is not a problem either the forwarder has a thruput close to 0 for most of the time and then from time to time indexes its data(I don't see why it behaves like this). I see a couple of WARN TcpOutputProc - Raw connection to ip=:9997 timed out in splunkd.log of the forwarder. So it might be the network. I opened a case for the issue.

How to debug a stuck (parsing)queue

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms