Solved: IndexQueue and AuditQueue blocked on Universal For...

David · ‎03-19-2014

On a universal forwarder that is apparently sending data, there are a large number (5.5k of blocked=true queue messages for indexqueue and auditqueue. Why is indexqueue showing up on a universal forwarder?

03-14-2014 14:50:44.502 +0000 INFO  Metrics - group=queue, name=auditqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=1109, largest_size=1109, smallest_size=1109
03-14-2014 14:50:44.502 +0000 INFO  Metrics - group=queue, name=indexqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=1110, largest_size=1110, smallest_size=1110

hexx · ‎03-19-2014

First off, the index queue is only present on universal forwarders as a recipient for the TCPout processor, responsible for sending the data out to configured receivers. No actual indexing activity is in the picture, of course.

In all likelihood, one of three things is happening:

The UF's self-imposed outgoing throughput limit (the "maxKBps" key in the "thruput" stanza of limits.conf) is kicking in and throttling the emission of data. Note that the default value for this setting is only 256 KBps.
The indexer(s) downstream are saturated and cannot keep up with the current aggregated indexing rate. As a result, their queues are saturated and forwarders are being turned down. The next steps in this investigation are to examine the pattern of event-processing queue saturation and of CPU usage of splunkd processors, which can be done with the "Indexing Performance" and "Distributed Indexing Performance" views provided with the S.o.S app.
If indexers are fine and no local throughput limit is in place, this could be a network throughput limitation between the UF and the indexers. This is rare, but does happen. It's also not easy to demonstrate, unfortunately.

View solution in original post

hexx · ‎03-19-2014

First off, the index queue is only present on universal forwarders as a recipient for the TCPout processor, responsible for sending the data out to configured receivers. No actual indexing activity is in the picture, of course.

In all likelihood, one of three things is happening:

The UF's self-imposed outgoing throughput limit (the "maxKBps" key in the "thruput" stanza of limits.conf) is kicking in and throttling the emission of data. Note that the default value for this setting is only 256 KBps.
The indexer(s) downstream are saturated and cannot keep up with the current aggregated indexing rate. As a result, their queues are saturated and forwarders are being turned down. The next steps in this investigation are to examine the pattern of event-processing queue saturation and of CPU usage of splunkd processors, which can be done with the "Indexing Performance" and "Distributed Indexing Performance" views provided with the S.o.S app.
If indexers are fine and no local throughput limit is in place, this could be a network throughput limitation between the UF and the indexers. This is rare, but does happen. It's also not easy to demonstrate, unfortunately.

IndexQueue and AuditQueue blocked on Universal Forwarder

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!