Getting Data In

Historical Data Indexing Speed

clyde772
Communicator

Hey Splunkers!

I have a question where we are testing Splunk for it's indexing speed. We are trying to do that with Forwarder and an indexer where we are trying to ingest 500GB data that resides in Forwarder. One thing we realize is that

  1. We had to control the bw limit to maximize the transport of the data

but there seems to be a mechnism that rates the indexing speed even we took out the bandwidth limit. It was fast then as it indexed older data the rate sinificantly dropped. so the question is

regardless of environment, how can we setup Splunk to maximize the amount of data ingestion speed?

So maximize the index speed for historical data over network forwarder?

Thanks in advance for your answers~! Happy summer~!

Tags (2)
0 Karma

Drainy
Champion

Basically you want to tune the maxKbps in limits.conf on the forwarder;
http://docs.splunk.com/Documentation/Splunk/latest/admin/Limitsconf

This controls the rate at which the forwarder can forward data over the network, but this will be limited by the network and also by IO on the local device for how quickly it can read the data (if it is a heavily used machine for example it may experience some delay).

On the indexer end you want to install Splunk on Splunk to monitor for blocked queues (or just search for them) as if Splunk cannot write to disc quick enough it will begin to block each of its queues in turn until eventually it blocks the TCP in so the forwarder will have to locally queue its data.

To avoid the indexer queue blocking (the final queue which writes to disc) you need to ensure you have a sufficiently high amount of IOPS available. This will depend entirely on the amount of data it is dealing with elsewhere and the machine load but really you want 800-1200 IOPS minimum. If the data cannot be written to disc quickly it will block.

What specification machine are you using for the indexer? if you are looking at 500GB then you should really have more than one indexer, infact you should probably have quite a few to take the load off. Remember there is also processing load as it parses the data arriving and performs any index time extractions you may have (which is hopefully minimal 😉 )

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...