Deployment Architecture

Why is our search head cluster scheduler failing following deployment or rolling restart?

duncangoff
Engager

We have a problem with the scheduler failing following a search head cluster (SHC) deployment, which is resolved only if we manually change the captain following the deployment. This is not an ideal solution, and we want to sort out the root cause.

Following last nights deployment, we saw the following sequence of events (mostly from the debug logs);

SHC Rolling Restart begins...All peers told to close down their searches in turn...Restarts complete normally with no error...

Then, Captain tells peers to remove artifacts "DEBUG SHCMaster - remove artifact aid=scheduler~" Most work fine, but two fail with the following errors;

"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact sid=154~ status=failed msg=sid is not an artifact but a remote search job "
"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact aid=154~ status=failed msg="Could not find artifact or sid"

From then on, the scheduler keeps repeating these errors and no scheduler searches, accelerations, alerts etc run until the captain is transferred.

Couldn't tell you if this is a symptom or cause. I can hazard a guess something went wrong with those searches, but what? And how do we stop it happening?

0 Karma

lakshman239
Influencer

Looks to me that following deployment/restart the captain election is not happing. have you tried clearing the RAFT status? Also, you would need to ensure the health of the KVstore across members is good. Also, look at the monitoring console for any issues from the SH members. https://docs.splunk.com/Documentation/Splunk/7.2.3/DistSearch/Handleraftissues

0 Karma

duncangoff
Engager

The Captain election happens fine with no issues, same for KV store

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...