Deployment Architecture

Search head cluster is breaking when mgmt_uri has a question mark. How do I get the correct mgmt_uri?

wegscd
Contributor

We lost one node out of a three node search head cluster. We went to static captaincy.

Sometime along the line, it appears that scheduled searches stopped working. Usually restarting one of the search heads got things going again, but right now the shcluster is in a mess.

The thing that always seems to accompany trouble with this thing is when mgmt_uri starts showing up in a 'show shcluster-status' as '?'.

Right now I have static captaincy transfer to node adculsplunkp6. a show shcluster-status there shows:

 Captain:
                  dynamic_captain : 0
                  elected_captain : Thu Jan  7 09:52:39 2016
                               id : F0214F20-327E-4591-ACC7-A03929CF829F
                 initialized_flag : 1
                            label : adculsplunkp6
                 maintenance_mode : 0
                         mgmt_uri : ?
            min_peers_joined_flag : 1
             rolling_restart_flag : 0
               service_ready_flag : 1

 Members: 
    adculsplunkp6
                            label : adculsplunkp6
                         mgmt_uri : ?
                   mgmt_uri_alias : https://xx.xx.xx.xxx:8089
                           status : Up
    adculsplunkp2
                            label : adculsplunkp2
                         mgmt_uri : ?
                   mgmt_uri_alias : https://xx.xx.xx.xx:8089
                           status : Up

On the other (non-captain), it's still shows a different captain and no member.

Captain:
                  dynamic_captain : 0
                  elected_captain : Thu Jan  7 10:01:16 2016
                               id : F0214F20-327E-4591-ACC7-A03929CF829F
                 initialized_flag : 1
                            label : adculsplunkp2
                 maintenance_mode : 0
                         mgmt_uri : ?
            min_peers_joined_flag : 1
             rolling_restart_flag : 0
               service_ready_flag : 1

 Members:

How do I get the correct mgmt_uris in there so things start behaving again?

gaurav_splunk
Splunk Employee
Splunk Employee

This issue has been fixed in 6.4.7 and 6.3.11, so feel free to upgrade your environment.

0 Karma

risgupta_splunk
Splunk Employee
Splunk Employee

The issue here is that, in case of static captaincy we read the mgmt_uri from memory. Hence, when we restarted the node, the value was lost and we did not read the value from disk/config. Hence the "?" in show shcluster-status command.

wegscd
Contributor

I am going back to static because whenever I call Splunk support with my 2 node cluster, they tell me that I am in a unsupported configuration. That third node is gone, and they don't support 2 node clusters.

The saved searches issue was caused by 6.3.0 bug; apparently they started tracking # of running searches across the cluster, had a bug in it, so eventually the cluster figured everyone was over quota and stopped schedules searches. Details at https://answers.splunk.com/answers/329518/why-do-scheduled-searches-randomly-stop-running-in.html

0 Karma

nvanderwalt_spl
Splunk Employee
Splunk Employee

So you don't really need to go to static if you have 2/3 of the nodes available. Were you doing it as a preventative measure in case you lost another node? If so, It would only cover you if you lost the non-captain.

Did you run the configure both remaining nodes to use the same static captain? Did you use fully qualified domain names?

You can go back to dynamic captaincy by bootstrapping one of the members (preferably the old static captain), then convert the others.

See http://docs.splunk.com/Documentation/Splunk/6.3.0/DistSearch/Staticcaptain

One last thing, are all your saved searches failing, or only some? If it is only some, it could be due to the fact that you have fewer cores available to process, which would decrease number of searches you can run.

0 Karma

wegscd
Contributor

I used the same static captain on both, and specified IP address (not my choice, the guy that set up the cluster did it that way).

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...