We are having intermittent issues where scheduled searches are running on all search heads. this is what we see in the logs.
2:11:28.731 PM
10-01-2015 14:11:28.731 +0000 ERROR SHPSlave - heartbeat failure (reason: failed method=POST path=/services/shcluster/captain/members/669AB89B-0FB3-48B8-A65A-92D15BDB805C captain=69.252.120.30:8089 rc=0 actual_response_code=502 expected_response_code=200 status_line=Error connecting: Connection refused error="Connection refused")
An HTTP 502 error is a Bad Gateway
Error, which is causing your search heads to not connect to the captain. This is probably causing your cluster members to "elect" themselves the captain, and then run the scheduled searches as the only one in the cluster. You can review how a captain is picked here: http://docs.splunk.com/Documentation/Splunk/6.2.2/DistSearch/SHCarchitecture#Search_head_cluster_cap... . Once you have fixed the 502 error, you might see the problem go away.
Also, upgrade to 6.2.6
to rule out any bugs that might be present.
are you saying the same scheduled search is running simultaneously on all search heads in the cluster?
Yes, we're seeing an issue where a scheduled search is running on all search heads. When this happens I can see the error above on all search heads i the cluster.
We have a 4 node cluster running 6.2.1