Getting Data In

Is there a way to determine the size of the AQ queue?

williamsweat
Path Finder

I'm getting similar messages that was posted in this question for a blocked AQ

Is there a way to track down the source of the backlog also?

Tags (1)
1 Solution

hexx
Splunk Employee
Splunk Employee

If you want to track the size of the archive processing queue, you could run the following search :

index=_internal host=host_reading_archives source=*metrics.log group=queue name=aq earliest=-24h | timechart span=15mn perc95(current_size), max(max_size)

Feel free to change the values of "earliest" and "span" to better fulfill your needs.

Note that we are asking for the 95th percentile of the "current_size" field and the maximum value of the "max_size" field to produce a good statistical representation of what's going on.

As for the remediation to the blockage of the archive processing queue, the answer and comments from Stephen Sorkin in the post you refer to are still valid. I would add a few :

  • If at all possible, feed the files to Splunk when uncompressed. Uncompressed files can be processed in parallel where archives have to be processed serially.

  • Find out if there other blocked queues downstream which might be the actual culprit by propagating the clogging upstream. You can do this by running a search like index=_internal host=host_reading_archives source=*metrics.log group=queue | timechart span=15mn perc95(current_size) by name. Change span or the value of "host" as needed, depending on where (forwarder reading the archives? indexer committing the events to disk?) and when you want to look at the size of the event processing queues.

  • Optimize event-processing for the sourcetypes indexed from the archive files by declaring explicit line-breaking rules (use LINE_BREAKER in props.conf) and time-stamp extractions (use TIME_FORMAT, TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD in props.conf). See props.conf.spec - http://www.splunk.com/base/Documentation/latest/Admin/Propsconf - for details.

  • If your architecture allows it, split the task of reading archive files between several Splunk forwarders.

View solution in original post

hexx
Splunk Employee
Splunk Employee

If you want to track the size of the archive processing queue, you could run the following search :

index=_internal host=host_reading_archives source=*metrics.log group=queue name=aq earliest=-24h | timechart span=15mn perc95(current_size), max(max_size)

Feel free to change the values of "earliest" and "span" to better fulfill your needs.

Note that we are asking for the 95th percentile of the "current_size" field and the maximum value of the "max_size" field to produce a good statistical representation of what's going on.

As for the remediation to the blockage of the archive processing queue, the answer and comments from Stephen Sorkin in the post you refer to are still valid. I would add a few :

  • If at all possible, feed the files to Splunk when uncompressed. Uncompressed files can be processed in parallel where archives have to be processed serially.

  • Find out if there other blocked queues downstream which might be the actual culprit by propagating the clogging upstream. You can do this by running a search like index=_internal host=host_reading_archives source=*metrics.log group=queue | timechart span=15mn perc95(current_size) by name. Change span or the value of "host" as needed, depending on where (forwarder reading the archives? indexer committing the events to disk?) and when you want to look at the size of the event processing queues.

  • Optimize event-processing for the sourcetypes indexed from the archive files by declaring explicit line-breaking rules (use LINE_BREAKER in props.conf) and time-stamp extractions (use TIME_FORMAT, TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD in props.conf). See props.conf.spec - http://www.splunk.com/base/Documentation/latest/Admin/Propsconf - for details.

  • If your architecture allows it, split the task of reading archive files between several Splunk forwarders.

williamsweat
Path Finder

Many Thanks. I thought that was the case, but wasn't sure.

0 Karma

hexx
Splunk Employee
Splunk Employee

To paraphrase Stephen Sorkin, seeing this queue at 1000 "means that the file processing code has found more than 1000 archive files that we are processing in turn." I would imagine that this is expected in your case.

0 Karma

williamsweat
Path Finder

Thanks. What does the size mean? Is there a good way to interpret if it's become to big? I enabled my input and the AQ queue jumped to over a 1000, while it's normally at 0. I have since disabled it again

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...