Getting Data In

Balancing Summary data across indexers

tjago11
Communicator

Read through the indexer rebalancing doc and that seems like good maintenance, but looking for something more proactive when running Summary Index jobs.

I found an answer that has an explanation of what is going on, but I'd like a way to fix it. As it stands when the spool file is generated and the SH sends the data, it just blasts everything to a single indexer. This means all my summary data for that job run resides on a single indexer. :sadpanda:

I've tried to get around this by running the job at a higher frequency than is needed to get a better distribution, but when I pull the data it is still very unbalanced.

I'm now in a situation where my customer wants to back-fill several months of data into a summary index and my concern is the back-fill will have the same behavior as the scheduled Summary job. Namely, it will put several months of data on a single indexer. :angry:

In reading the documentation of the collect command, it certainly seems like it will behave the same as the Summary job. Specifically this line:

  • spoolSyntax: spool= Description: If set to true, the summary indexing file is written to the Splunk spool directory, where it is indexed automatically

That same verbiage appears in the configuration section of Summary Index jobs.

Ultimately I need a way to have my data balanced on the indexers, would be great if this could be done at ingestion time. Thanks.

p.s.
I tried to post links for the docs but I'm too much of a noob.

1 Solution

tjago11
Communicator

Been working with our Customer Success Manager and opened a case in Splunk. As of right now there is not a good solution for ensuring the Summary Index data is well balanced. @sjohnson has a good solution if the size of the data files are very large and/or the index cluster is small. Unfortunately with data files that are under 1MB and an index cluster with 18 indexers this won't work for us.

Couple things I'm doing to mitigate the poor balancing:

  • Run Summary Jobs that rollup to a daily value ever 4hrs (by running more often it is more likely that data isn't all on one indexer)
  • Run backfill scripts multiple times to get the data in chunks (same as above)
  • Run Indexer Rebalancing as part of quarterly maintenance

Ultimately there is not a clean way to ensure Summary Index data is ingested to an index cluster in a balanced fashion. But by running things more often and doing maintenance the impact can be mitigated. Thanks.

View solution in original post

0 Karma

tjago11
Communicator

Been working with our Customer Success Manager and opened a case in Splunk. As of right now there is not a good solution for ensuring the Summary Index data is well balanced. @sjohnson has a good solution if the size of the data files are very large and/or the index cluster is small. Unfortunately with data files that are under 1MB and an index cluster with 18 indexers this won't work for us.

Couple things I'm doing to mitigate the poor balancing:

  • Run Summary Jobs that rollup to a daily value ever 4hrs (by running more often it is more likely that data isn't all on one indexer)
  • Run backfill scripts multiple times to get the data in chunks (same as above)
  • Run Indexer Rebalancing as part of quarterly maintenance

Ultimately there is not a clean way to ensure Summary Index data is ingested to an index cluster in a balanced fashion. But by running things more often and doing maintenance the impact can be mitigated. Thanks.

0 Karma

sjohnson_splunk
Splunk Employee
Splunk Employee

Likely the reason for the poor distribution across indexers is that the SH can read the summarized data very quickly. Unless you have tweaked your ouputs.conf on the SH it will stick to a single indexer for 30 seconds which is probably more than enough time to read the file to EOF.

You might try adding something like this to the outputs.conf:

[tcpout]
forceTimebasedAutoLB=true
autoLBFrequency = 10

If the results are really small (couple 1000's) this still might not be small enough interval but I don't think I would go much below 10 seconds.

0 Karma

tjago11
Communicator

After some further research this approach isn't going to work. Our summary jobs complete in just a few seconds with file sizes well under 1 MB, so unless we were to change the autoLBFrequency property to something freakishly small like 2s we will still have everything go to a single indexer. Furthermore, even if we were to change the value it would send to data to maybe two indexers...we have 18. Two is 100% better than One, but still not good.

0 Karma

tjago11
Communicator

From what I gather in your response this would impact scheduled Summary Jobs as well as any one time back-fill, as they both use stash files??

Thanks for the tip, I'll look into using that property and come back to accept if it works. Given my sample set you are likely right that the generated file is processed well below the default 30s threshold.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...