Getting Data In

Where does the forwarder enqueue files?

ddrillic
Ultra Champion

We see the following messages in the forwarder -

10-18-2017 11:15:29.630 -0500 WARN  TailReader - Enqueuing a very large file=<hadoop large file> in the batch reader, with bytes_to_read=4981188783, reading of other large files could be delayed

Where does the forwarder enqueue the files? and is there a way to dequeue them?

When is the BatchReader used and when is the TailingProcessor used?

says -

-- The batch reader is used when the file is over 20 MB in size. Otherwise, the regular tailing processor queue is used. The batch reader only processes one file at a time, while the tailing processor can do many. The limit exists to prevent a bunch of large files for using up all slots and starving out new smaller files.

0 Karma
1 Solution

HiroshiSatoh
Champion

I think that there is no problem because TailReader and BatchReader are processed separately.
https://wiki.splunk.com/Community:HowIndexingWorks

Do you not want to capture large files of problems?
Or are there any large files that you would like to give priority to importing?

View solution in original post

0 Karma

HiroshiSatoh
Champion

I think that there is no problem because TailReader and BatchReader are processed separately.
https://wiki.splunk.com/Community:HowIndexingWorks

Do you not want to capture large files of problems?
Or are there any large files that you would like to give priority to importing?

0 Karma

ddrillic
Ultra Champion

You see, this thread relates to Why are the queues being filled up on one indexer?

In this one, I see things from the forwarder's side. It seems to me that the BatchReader process with huge amounts of data, locks on one indexer. Also the BatchReader process seems to be irreversible, because I moved out the flume app and after 6 hours the enqueue files started to flow into Splunk (on the same indexer). Only by uninstalling the forwarder, the issue got cleared.

0 Karma

gjanders
SplunkTrust
SplunkTrust

Is it a universal forwarder reading the files?
if you are using Splunk 6.6 or newer you might be able to use the EVENT_BREAKER and the EVENT_BREAKER_ENABLE in your props.conf to advise the forwarder where the end of each event is, this will allow it to switch output locations without seeing end of file...

0 Karma

ddrillic
Ultra Champion

It is the universal forwarder reading the files. I think it's TailReader versus BatchReader. What I see is that TailReader is real-time versus BatchReader which is not and also we don't seem to have control of the pending batches..

Then the association of the batch to the indexer. We had in this past week one indexer which ended up receiving 3/4 TB a day of data all being streamed from this single batch single forwarder.

0 Karma

HiroshiSatoh
Champion

A single large file occupies one indexer, which may degrade overall performance. The solution is as described by garethatiag.
Also, increasing the number of forwarders and indexers pipelines may be a solution.

For performance troubleshooting, you need to know more about the environment and events. I think it would be better to describe the environment and events accurately and ask again.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...