Hi guys
We are testing the Search Head pooling functionality. We have one dedicated deployer and 5 searchhead clustermembers. To deploy we execute following command:
splunk apply shcluster-bundle --answer-yes -target https://[MEMBER_HOSTNAME]:8089 -auth [SPLUNK_USER]:[PW]
Sometimes it works good. Sometimes not and then it has different errors.
Error Nr. 1:
Error while deploying apps to target=https://[MEMBER_HOSTNAME]:8089 with members=5: ConfDeploymentException: Error while updating app=XXX on target=https://[MEMBER_IP]:8089: Non-200/201 status_pre=500; {"messages":[{"type":"ERROR","text":"\n In handler 'localapps': Error during app install: failed to extract app from /appl/splunk/var/run/splunk/bundle_tmp/2753df224a95e6e5.bundle to /appl/splunk/var/run/splunk/bundle_tmp/1074b24058b88cde: No such file or directory"}]}
Error Nr. 2:
Error while deploying apps to first member: ConfDeploymentException: Error while fetching apps baseline on target=https://[MEMBER_IP]:8089: Network-layer error: Connection reset by peer
Error Nr. 3:
Error while deploying apps to target=https://[MEMBER_HOSTNAME]:8089 with members=5: ConfDeploymentException: Error while fetching apps baseline on target=https://[MEMBER_IP]:8089: Network-layer error: Connection reset by peer
Error Nr. 4:
Error when getting master uri from target to do a rolling-restart Error connecting: Connection refused
What astonishes me, what I do not understand, is: why does it sometimes work, and sometimes not?
Sometimes I have to execute the deploy command more than 5 times consecutively! It begins to annoy me.
Does somebody experience the same? Or does somebody even have a solution, or an explanation?
Thanks
- Muryoutaisuu
We experienced that some of the errors happened when deploying twice too fast. The search heads were still restarting or executing post-start tasks. The error messages here are a bit misleading.
However, I can't recall anymore which ones of the four errors occurred in such a case and whether the issue still exists on first deployment try.
Following error will be the wrong configuration in server.conf, double check the property of "mgmt_uri"
mgmt_uri = https://:
Error when getting master uri from target to do a rolling-restart Service Unavailable
Sounds like you have a firewall/network problem. I'm consistently getting Error number 4 from your list above.
Nope. I'm getting the same errors also. It's not a firewall problem, as the systems are all directly connected.
The error I'm getting is:
/opt/splunk/bin/splunk apply shcluster-bundle -target https://:8089
Warning: Depending on the configuration changes being pushed, this command might initiate a rolling restart of the cluster members. Please refer to the documentation for the details. Do you wish to continue? [y/n]: y
Error when getting master uri from target to do a rolling-restart Service Unavailable
I upgraded from Splunk Enterprise 6.2.5 yo 6.3 on Linux Centos 6.5
The error I get on the cluster master splunkd.log when trying to run...
[root@ClusterMaster ~]# splunk apply shcluster-bundle --answer-yes -target https://10.zz.yyy.x:8089 -auth admin:adminPasss
is
09-25-2015 15:40:31.078 +0200 WARN AppsDeployHandler - Error while fetching members from uri=https://10.zz.yyy.x:8089: Non-200 status_code=503: Service Unavailable
Please help resolve!
We do not have any firewalls between the servers. Nor do we have problems with network.
On the second search head cluster I do not have any troubles. I suggest that is because we do not have much data that needs deployment there. Perhaps the network load used for deployment causes those strange errors...