Hi,
I have universal forwarder monitoring a number of directories and forwarding to an indexer.
On the forwarder, there are repeating entries in the splunkd.log file:
03-04-2013 12:12:39.503 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:12:44.506 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:12:54.543 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:09.551 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:13:14.568 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:19.571 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:13:29.607 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:34.609 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:13:49.644 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:54.647 +0000 INFO TailingProcessor - ...continuing.
etc.
The main effect of this seems to be a delay of ~10 mins to data being searchable.
I do not believe the indexer is the bottleneck as the indexer. I have Splunk On Splunk and according to that the queue's are pretty much zero
I have increased the persistent queue size to 100Mb on the forwarder but it still get's the error.
The metrics.log on the forwarder shows that the queues don't seem to be near full (either the parsingqueue or the tcpout queue):
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=tcpout_sec-mgr-01_9997, max_size=512000, current_size=65736, largest_size=65736, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=aq, max_size_kb=10240, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=auditqueue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=fschangemanager_queue, max_size_kb=5120, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=indexqueue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=nullqueue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
**03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=parsingqueue, max_size_kb=102400, current_size_kb=101811, current_size=2434, largest_size=2556, smallest_size=2417**
03-04-2013 12:13:42.031 +0000 INFO Metrics - group=queue, name=tcpin_queue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
CPU is low on both boxes.
On forwarder, splunk list monitor | wc -l gives 14264
On indexer metrics.log has no instances of blocked
On forwarder metrics.log has a few instances of blocked=true but but current_size is always low compared to max_size +kb:
Example:
Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=102400, current_size_kb=102399, current_size=1682, largest_size=1689, smallest_size=1662
Any ideas would be really appreciated. Don't know what the slowness is of how to fix it.
... View more