Getting Data In

indexing load balancing with [script] input

akazarov
Path Finder

Hello,

We have set up a small splunk cluster, with 3 indexers getting data from universal forwarder, which is configured for output as

[tcpout:default-autolb-group]
autoLBFrequency=40
server = pc-tdq-bst-04:9995, pc-tdq-bst-05:9995, pc-tdq-sfo-06:9995

as for input as

[script:///opt/splunkforwarder/bin/scripts/pbeast_injector.sh <parameters>]

The script never stops, it gets data from an external online monitoring system.
After having indexed many events in few days, we realized that majority of events were indexed by first indexer in the list, pc-tdq-bst-04. E.g. a typical query returns stats like this:

dispatch.stream.remote.pc-tdq-bst-04.cern.ch    220 -   68,031,902
dispatch.stream.remote.pc-tdq-bst-05.cern.ch    2   -   4,584
dispatch.stream.remote.pc-tdq-sfo-06.cern.ch    1   -   2,386

The indexers are almost identical and have sufficient disk space. Earlier they were used for indexing the files, and the load was randomly distributed, but behavior of the script input is quite different.

Is there a way to enforce sort of round-robin balancing for the [script] input, given that the script is running permanently?

Thanks
Andrei

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I believe that if your script injects EOF characters or null bytes into the output stream at appropriate points (e.g., between events) then the Splunk forwarder will allow a switch of that input to another indexer.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

You shouldn't need to close and reopen after every event then. You only need to do it once every few seconds (every few thousand events, e.g., keep a counter of events and only do it when counter % 20000 == 0).

0 Karma

akazarov
Path Finder

It appeared that adding EOF means calling fclose(stdout) and opening it again, which is not doable at the rate of kHz. Note that there is no EOF "character".

Adding 0 bytes between events did not help, splunk just recorded 0x00 bytes as part of raw data.

I also tried EOT character (0x04) with no affect.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Replication currently does not balance requests across buckets unless there is some kind of failure. Until that happens, the primary indexer for the bucket of data containing the event remains the one that first indexed it. Even when/if replication gains this feature, you may have too few events for that level of granularity to be visible, as replication occurs in bucket-sized increments, and buckets can contain up to a few hundred million events.

0 Karma

akazarov
Path Finder

Great idea, thanks!

However, even in present configuration, given that we have replication factor = 2, I expected that 1/2 of events would be coming from bst-04 node the rest 1/4 + 1/4 events from other 2 nodes, because of replication. Replication should work, even if I send all my data to one indexer, no?

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...