Getting Data In

Is there a way to determine the size of the AQ queue?

williamsweat
Path Finder

I'm getting similar messages that was posted in this question for a blocked AQ

Is there a way to track down the source of the backlog also?

Tags (1)
1 Solution

hexx
Splunk Employee
Splunk Employee

If you want to track the size of the archive processing queue, you could run the following search :

index=_internal host=host_reading_archives source=*metrics.log group=queue name=aq earliest=-24h | timechart span=15mn perc95(current_size), max(max_size)

Feel free to change the values of "earliest" and "span" to better fulfill your needs.

Note that we are asking for the 95th percentile of the "current_size" field and the maximum value of the "max_size" field to produce a good statistical representation of what's going on.

As for the remediation to the blockage of the archive processing queue, the answer and comments from Stephen Sorkin in the post you refer to are still valid. I would add a few :

  • If at all possible, feed the files to Splunk when uncompressed. Uncompressed files can be processed in parallel where archives have to be processed serially.

  • Find out if there other blocked queues downstream which might be the actual culprit by propagating the clogging upstream. You can do this by running a search like index=_internal host=host_reading_archives source=*metrics.log group=queue | timechart span=15mn perc95(current_size) by name. Change span or the value of "host" as needed, depending on where (forwarder reading the archives? indexer committing the events to disk?) and when you want to look at the size of the event processing queues.

  • Optimize event-processing for the sourcetypes indexed from the archive files by declaring explicit line-breaking rules (use LINE_BREAKER in props.conf) and time-stamp extractions (use TIME_FORMAT, TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD in props.conf). See props.conf.spec - http://www.splunk.com/base/Documentation/latest/Admin/Propsconf - for details.

  • If your architecture allows it, split the task of reading archive files between several Splunk forwarders.

View solution in original post

hexx
Splunk Employee
Splunk Employee

If you want to track the size of the archive processing queue, you could run the following search :

index=_internal host=host_reading_archives source=*metrics.log group=queue name=aq earliest=-24h | timechart span=15mn perc95(current_size), max(max_size)

Feel free to change the values of "earliest" and "span" to better fulfill your needs.

Note that we are asking for the 95th percentile of the "current_size" field and the maximum value of the "max_size" field to produce a good statistical representation of what's going on.

As for the remediation to the blockage of the archive processing queue, the answer and comments from Stephen Sorkin in the post you refer to are still valid. I would add a few :

  • If at all possible, feed the files to Splunk when uncompressed. Uncompressed files can be processed in parallel where archives have to be processed serially.

  • Find out if there other blocked queues downstream which might be the actual culprit by propagating the clogging upstream. You can do this by running a search like index=_internal host=host_reading_archives source=*metrics.log group=queue | timechart span=15mn perc95(current_size) by name. Change span or the value of "host" as needed, depending on where (forwarder reading the archives? indexer committing the events to disk?) and when you want to look at the size of the event processing queues.

  • Optimize event-processing for the sourcetypes indexed from the archive files by declaring explicit line-breaking rules (use LINE_BREAKER in props.conf) and time-stamp extractions (use TIME_FORMAT, TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD in props.conf). See props.conf.spec - http://www.splunk.com/base/Documentation/latest/Admin/Propsconf - for details.

  • If your architecture allows it, split the task of reading archive files between several Splunk forwarders.

williamsweat
Path Finder

Many Thanks. I thought that was the case, but wasn't sure.

0 Karma

hexx
Splunk Employee
Splunk Employee

To paraphrase Stephen Sorkin, seeing this queue at 1000 "means that the file processing code has found more than 1000 archive files that we are processing in turn." I would imagine that this is expected in your case.

0 Karma

williamsweat
Path Finder

Thanks. What does the size mean? Is there a good way to interpret if it's become to big? I enabled my input and the AQ queue jumped to over a 1000, while it's normally at 0. I have since disabled it again

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...