indexing load balancing with [script] input

akazarov · ‎07-03-2013

Hello,

We have set up a small splunk cluster, with 3 indexers getting data from universal forwarder, which is configured for output as

[tcpout:default-autolb-group]
autoLBFrequency=40
server = pc-tdq-bst-04:9995, pc-tdq-bst-05:9995, pc-tdq-sfo-06:9995

as for input as

[script:///opt/splunkforwarder/bin/scripts/pbeast_injector.sh <parameters>]

The script never stops, it gets data from an external online monitoring system.
After having indexed many events in few days, we realized that majority of events were indexed by first indexer in the list, pc-tdq-bst-04. E.g. a typical query returns stats like this:

dispatch.stream.remote.pc-tdq-bst-04.cern.ch    220 -   68,031,902
dispatch.stream.remote.pc-tdq-bst-05.cern.ch    2   -   4,584
dispatch.stream.remote.pc-tdq-sfo-06.cern.ch    1   -   2,386

The indexers are almost identical and have sufficient disk space. Earlier they were used for indexing the files, and the load was randomly distributed, but behavior of the script input is quite different.

Is there a way to enforce sort of round-robin balancing for the [script] input, given that the script is running permanently?

Thanks
Andrei

gkanapathy · ‎07-03-2013

I believe that if your script injects EOF characters or null bytes into the output stream at appropriate points (e.g., between events) then the Splunk forwarder will allow a switch of that input to another indexer.

gkanapathy · ‎07-05-2013

You shouldn't need to close and reopen after every event then. You only need to do it once every few seconds (every few thousand events, e.g., keep a counter of events and only do it when counter % 20000 == 0).

akazarov · ‎07-05-2013

It appeared that adding EOF means calling fclose(stdout) and opening it again, which is not doable at the rate of kHz. Note that there is no EOF "character".

Adding 0 bytes between events did not help, splunk just recorded 0x00 bytes as part of raw data.

I also tried EOT character (0x04) with no affect.

gkanapathy · ‎07-04-2013

Replication currently does not balance requests across buckets unless there is some kind of failure. Until that happens, the primary indexer for the bucket of data containing the event remains the one that first indexed it. Even when/if replication gains this feature, you may have too few events for that level of granularity to be visible, as replication occurs in bucket-sized increments, and buckets can contain up to a few hundred million events.

akazarov · ‎07-04-2013

Great idea, thanks!

However, even in present configuration, given that we have replication factor = 2, I expected that 1/2 of events would be coming from bst-04 node the rest 1/4 + 1/4 events from other 2 nodes, because of replication. Replication should work, even if I send all my data to one indexer, no?

indexing load balancing with [script] input

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!