Getting Data In

Could not send data to output queue (parsingQueue)

philyeo42
New Member

Hi,

I have universal forwarder monitoring a number of directories and forwarding to an indexer.
On the forwarder, there are repeating entries in the splunkd.log file:

03-04-2013 12:12:39.503 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:12:44.506 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:12:54.543 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:09.551 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:13:14.568 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:19.571 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:13:29.607 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:34.609 +0000 INFO TailingProcessor - ...continuing.
03-04-2013 12:13:49.644 +0000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
03-04-2013 12:13:54.647 +0000 INFO TailingProcessor - ...continuing.

etc.

The main effect of this seems to be a delay of ~10 mins to data being searchable.

I do not believe the indexer is the bottleneck as the indexer. I have Splunk On Splunk and according to that the queue's are pretty much zero

I have increased the persistent queue size to 100Mb on the forwarder but it still get's the error.
The metrics.log on the forwarder shows that the queues don't seem to be near full (either the parsingqueue or the tcpout queue):

03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=tcpout_sec-mgr-01_9997, max_size=512000, current_size=65736, largest_size=65736, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=aq, max_size_kb=10240, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=auditqueue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=fschangemanager_queue, max_size_kb=5120, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=indexqueue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=nullqueue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0
**03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=parsingqueue, max_size_kb=102400, current_size_kb=101811, current_size=2434, largest_size=2556, smallest_size=2417**
03-04-2013 12:13:42.031 +0000 INFO  Metrics - group=queue, name=tcpin_queue, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=0, smallest_size=0

CPU is low on both boxes.

On forwarder, splunk list monitor | wc -l gives 14264

On indexer metrics.log has no instances of blocked
On forwarder metrics.log has a few instances of blocked=true but but current_size is always low compared to max_size+kb:

Example:
Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=102400, current_size_kb=102399, current_size=1682, largest_size=1689, smallest_size=1662

Any ideas would be really appreciated. Don't know what the slowness is of how to fix it.

0 Karma

amehta_splunk
Splunk Employee
Splunk Employee

Indexer discovery used in Multisite clustering
There can be many reasons for this failure, including the ones listed above.

An additional reason that this message comes up is because of indexer discovery when using multisite clustering. When using multisite clustering, every forwarder must have a site. If you wish to avoid site affinity, you may use site0.

The configuration looks like this:

# server.conf
[general]
site = site0

References:
1. http://docs.splunk.com/Documentation/Splunk/6.4.3/Indexer/indexerdiscovery#Use_indexer_discovery_in_...

"Important: When you use indexer discovery with multisite clustering, you must assign a site-id to all forwarders, whether or not you want the forwarders to be site-aware. If you want a forwarder to be site-aware, you assign it a site-id for a site in the cluster, such as "site1," "site2," and so on. If you do not want a forwarder to be site-aware, you assign it the special site-id of "site0". When a forwarder is assigned "site0", it will forward to peers across all sites in the cluster."

0 Karma

lguinn2
Legend

Wow, I am humbled to be so opinionated and yet so wrong. Still, I think that 14K files are a lot, and I am not sure why the ignoreOlderThan = 2d wasn't working for you.

Could you be hitting the 256 KBPS limit on the universal forwarder? The forwarder limits its use of the network to 256 KBPS to avoid saturating the network on a production machine. You can change this by editing etc/system/local/limits.conf:

[thruput]
maxKBps = 0
# means unlimited

If you continue to have problems, a call to Splunk Support might be next. You have certainly done your homework!

0 Karma

lguinn2
Legend

If you are monitoring anywhere near 14,000 files on a forwarder - I'll bet that this is your problem. You can increase the file descriptors, etc. but you will probably still have performance issues. A ten minute delay in indexing is actually pretty darn good considering the work that Splunk is doing. I'll bet that the forwarder is consuming more CPU and memory than it should, too.

Even if only a portion of these files are actively being updated, Splunk will monitor ALL of them. This means that Splunk will examine the mod time of each file in a round-robin fashion. Over and over again, even though nothing has (and maybe never will) change. Because Splunk can't know which files will or won't be updated.

This is obviously a huge waste of machine time if most of the files are not being updated. Here are some steps that you could take:

  1. Remove the older files.
  2. Rename the older files to a name, perhaps xyz.OLD. Blacklist files using the regex .OLD$ and Splunk will skip them
  3. Use the ignoreOlderThan = <time window> in inputs.conf - but BE CAREFUL. ignoreOlderThan causes the monitored input to stop checking files for updates if their modtime has passed this threshold. So if you set it for 14d, then you can't ever add a file older than 2 weeks into the directory. (Well, you can, but Splunk will ignore it.)

If you must monitor this many files, consider installing 2 copies of the forwarder. Split the monitoring between them by assigning them different directories. I would try to keep the total number of files being monitored by a forwarder under 5,000 if possible.

philyeo42
New Member

OK. This still doesn't work.
There are now <1000 files monitored, the parsingqueue is mostly full (~200MB). The CPU cpu is under 20 and splunk is hardly using it. Why can't it keep up?

0 Karma

philyeo42
New Member

Thanks for the tips. Ideally this server (the raw syslog server will KEEP a full set of raw logs so don't really want to delete them)

I have already got ignoreOlderThan = 2d BUT it is interesting to note that the file list in "list monitor" contains all the entries including files from several days ago.

There are ~260 logs for today, yesterday and the day before so total of approx 800 logs that should be being monitored if it honours the ingoreolderthan. I guess it still scans them all to check if they are olderthan...

Might have to go with opt2 then.

0 Karma

philyeo42
New Member

On the indexer I do not see any blocked=true.
On the forwarder there are a couple entries over several days but the numbers look odd: Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=102400, current_size_kb=102399, current_size=1682, largest_size=1689, smallest_size=1662

On forwarder:
splunk list monitor | wc -l
14264

(I have to up ulimits on the OS and increase the max_fd in limits.conf on the splunk forwarder)

0 Karma

lguinn2
Legend

Also, how many files is the forwarder monitoring? On the forwarder, run this command

splunk list monitor

0 Karma

Drainy
Champion

Out of interest, does blocked=true appear anywhere in the metrics.log on the indexer or forwarder?

0 Karma