Solved: Is there a way to determine the size of the AQ que...

williamsweat · ‎05-18-2011

I'm getting similar messages that was posted in this question for a blocked AQ

Is there a way to track down the source of the backlog also?

hexx · ‎05-22-2011

If you want to track the size of the archive processing queue, you could run the following search :

index=_internal host=host_reading_archives source=*metrics.log group=queue name=aq earliest=-24h | timechart span=15mn perc95(current_size), max(max_size)

Feel free to change the values of "earliest" and "span" to better fulfill your needs.

Note that we are asking for the 95th percentile of the "current_size" field and the maximum value of the "max_size" field to produce a good statistical representation of what's going on.

As for the remediation to the blockage of the archive processing queue, the answer and comments from Stephen Sorkin in the post you refer to are still valid. I would add a few :

If at all possible, feed the files to Splunk when uncompressed. Uncompressed files can be processed in parallel where archives have to be processed serially.
Find out if there other blocked queues downstream which might be the actual culprit by propagating the clogging upstream. You can do this by running a search like index=_internal host=host_reading_archives source=*metrics.log group=queue | timechart span=15mn perc95(current_size) by name. Change span or the value of "host" as needed, depending on where (forwarder reading the archives? indexer committing the events to disk?) and when you want to look at the size of the event processing queues.
Optimize event-processing for the sourcetypes indexed from the archive files by declaring explicit line-breaking rules (use LINE_BREAKER in props.conf) and time-stamp extractions (use TIME_FORMAT, TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD in props.conf). See props.conf.spec - http://www.splunk.com/base/Documentation/latest/Admin/Propsconf - for details.
If your architecture allows it, split the task of reading archive files between several Splunk forwarders.

View solution in original post

hexx · ‎05-22-2011

If you want to track the size of the archive processing queue, you could run the following search :

index=_internal host=host_reading_archives source=*metrics.log group=queue name=aq earliest=-24h | timechart span=15mn perc95(current_size), max(max_size)

Feel free to change the values of "earliest" and "span" to better fulfill your needs.

Note that we are asking for the 95th percentile of the "current_size" field and the maximum value of the "max_size" field to produce a good statistical representation of what's going on.

As for the remediation to the blockage of the archive processing queue, the answer and comments from Stephen Sorkin in the post you refer to are still valid. I would add a few :

If at all possible, feed the files to Splunk when uncompressed. Uncompressed files can be processed in parallel where archives have to be processed serially.
Find out if there other blocked queues downstream which might be the actual culprit by propagating the clogging upstream. You can do this by running a search like index=_internal host=host_reading_archives source=*metrics.log group=queue | timechart span=15mn perc95(current_size) by name. Change span or the value of "host" as needed, depending on where (forwarder reading the archives? indexer committing the events to disk?) and when you want to look at the size of the event processing queues.
Optimize event-processing for the sourcetypes indexed from the archive files by declaring explicit line-breaking rules (use LINE_BREAKER in props.conf) and time-stamp extractions (use TIME_FORMAT, TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD in props.conf). See props.conf.spec - http://www.splunk.com/base/Documentation/latest/Admin/Propsconf - for details.
If your architecture allows it, split the task of reading archive files between several Splunk forwarders.

williamsweat · ‎05-23-2011

Many Thanks. I thought that was the case, but wasn't sure.

hexx · ‎05-23-2011

To paraphrase Stephen Sorkin, seeing this queue at 1000 "means that the file processing code has found more than 1000 archive files that we are processing in turn." I would imagine that this is expected in your case.

williamsweat · ‎05-22-2011

Thanks. What does the size mean? Is there a good way to interpret if it's become to big? I enabled my input and the AQ queue jumped to over a 1000, while it's normally at 0. I have since disabled it again

Is there a way to determine the size of the AQ queue?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk