Splunk Search

Is it normal for a rolling restart of 18 indexers to take 12-24 hours?

mschlapfer
Explorer

We are having an issue recently where a rolling restart of our indexer cluster can take 12-24 hours for 18 indexers. We are on Splunk 7.0.7. We pushed some changes a couple weekends ago and it took about 22 hours to complete the restart of all indexers. The weekend before it took about 10 hours. Is anyone else seeing rolling restarts take this long? How much time should we expect and 18 indexer cluster to complete restarts? Any advice on where to look as to why it is taking so long?

thanks,

Marcel

DavidHourani
Super Champion

Hi There,

Indexers usually takes time when they restart because hot buckets will roll to warm. When you have very high volumes going in to each indexer it will slow down the restart process.

A couple of things to consider to help make the process smoother:

1- Are you using a normal rolling restart or searchable rolling restart ? If your restarts are taking too long then using searchable rolling restart is a good way to have minimal search interruption :
https://docs.splunk.com/Documentation/Splunk/7.2.0/Indexer/Userollingrestart#How_searchable_rolling_...

2- Check your replication and search factor if you're using multisite with sites having a single copy of the data then running a rolling restart without specifying the percentage of hosts will slow down the rolling restart process. Default for rolling restart is 10% of the server restarting at the same time, in your case since you have 18 servers then 2 will restart at the same time possibly causing holes in your scheduled searches depending on your RF and SF :
https://docs.splunk.com/Documentation/Splunk/7.2.0/Indexer/Userollingrestart#Specify_the_percentage_...

3- Check the details of your current Splunk version for anything related to slow restarts, could help find some bugs related to that and upgrading just might fix it ^^

Cheers,
David

0 Karma

dkeck
Influencer

Hi Marcel,

I can´t tell you how long it should be, but maybe a second experience could help. In the biggest environment we have 9 Indexers.

For me, after a bundle push, it's sometimes not clear if a rolling restart is necessary or not.

I do see that the restart, if necessary, can take up to a couple of hours. Within this time period, check the splunkd.log of your indexer(s) on the CLI. You will probably see a lot of bucket moving, normally for every index you got. And, after the rolling restart, most of the time there will be a lot of fix up tasks too. You can check them in the cluster master dashboard under indexes->buckets. Until these fixup tasks are not done, the cluster will not met all of its factors.

In the docs there is not much about a "slow restart", but you might, want to check this page. You can edit some values that might lead to a faster restart. : http://docs.splunk.com/Documentation/Splunk/7.2.0/Indexer/Userollingrestart#Handle_slow_restarts

Kind Regards

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...