Getting Data In

How to make Splunk read my files faster?

Jason
Motivator

I have a folder of 100 1GB files on a forwarder that I need to get into Splunk ASAP via a monitor://. One forwarder, distributing to 10 indexers. As such, I need to read as many of these files as I can at once.

The forwarder has unlimited upload, and I have raised the threshold for using the BatchReader - but STILL it reads files one at a time - how do I fix this?

# limits.conf
# Raise limit after which Splunk will use single-threaded BatchReader to 2GB
min_batch_size_bytes = 2147000000

# Unlimit output
[thruput]
maxKBps = 0
Tags (2)
0 Karma

sowings
Splunk Employee
Splunk Employee

forceTimebasedAutoLB should make better use of your indexing tier in this case. It might not help with the "read more than one file at a time" bit, but it'll spread the load out to the indexer tier more uniformly, taking advantage of input queues and the like. Your current way is probably ending up with "one file to that indexer, one file to the second indexer", etc. switching only on EOF.

0 Karma

MuS
SplunkTrust
SplunkTrust

More a work around than a fix: split the files into 10 directories, run multiple instances of the UF each monitoring one directory and distributing to the 10 indexers.

vliggio
Communicator

Note that the min_batch_size_bytes is NOT actually bytes, but MB (see the limits.conf docs). So 2GB should be 2048, not 2147000000.

0 Karma

hays2
Observer

I believe this might have been changed to MB in 6.6.x (just comparing the min_batch_size_bytes text in limits.conf spec). Any references to this, that I've seen, have always been in bytes.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...