Splunk Search

"Unable to distribute to peer" -- adjustable timeout?

batzel
Engager

I'm getting quite a few "Unable to distribute to peer..." messages when searching in splunk.

The reasons given tend to be '...because peer has status = "Down".' or Authentication Failed.

Sometimes just reloading the page will get through to the search peers. Sometimes it gives me that error a number of times in a row. But I've verified that the peer is not down, and I can connect to it from the search head with no problems.

The splunk servers are in different datacenters, and all I can think of is that there's a little bit of network lag and the connections aren't being made quickly enough?

Is there a config option to alter whatever timeout there is for this? Am I on the right track, or can someone suggest what else to look at?

drrushi_splunk
Splunk Employee
Splunk Employee

Additionally splunkd_access.log on the indexing peer will show the POST requests to this endpoint: /services/admin/auth-tokens
If these requests are taking longer than 10000ms then you are hitting the default timeout (authTokenReceiveTimeout).

0 Karma

raziasaduddin
Path Finder

Where else would we see these authToken related messages in the log? The indexers are still intermittently down and I cannot figure out why.

I tried:
[distributedSearch]
authTokenConnectionTimeout = 20
authTokenReceiveTimeout = 30
authTokenSendTimeout = 30

I still see this error after a monute or so:

Unable to distribute to peer named BLAH at uri https://BLAH:8089 because replication was unsuccessful. replicationStatus Failed

0 Karma

drrushi_splunk
Splunk Employee
Splunk Employee

The timeout settings for the authentication token exchange between search-head and peers are exposed now as configurable values in distsearch.conf (since v4.3.6):

authTokenConnectionTimeout =
* Maximum number of seconds to connect to a remote search peer, when getting its auth token
* Default is 5

authTokenSendTimeout =
* Maximum number of seconds to send a request to the remote peer, when getting its auth token
* Default is 10

authTokenReceiveTimeout =
* Maximum number of seconds to receive a response from a remote peer, when getting its auth token
* Default is 10

drrushi_splunk
Splunk Employee
Splunk Employee

If you don't see any offending requests on the peer and the auth status is still failed then the request is not able to make to the peer at all. Here you may want to investigate general connectivity to the peer and adjust authTokenConnectionTimeout and authTokenSendTimeout.
For failed connections check the splunkd.log on the search-head for Warn messages from UserManagerPro component:

WARN UserManagerPro - Unable to connect to peeruri=

0 Karma

Ayn
Legend

There is indeed. Have a look at distsearch.conf (http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf ), particularly the following parameters:

connectionTimeout = <int, in seconds>
* Amount of time in seconds to use as a timeout during search peer connection establishment.

sendTimeout = <int, in seconds>
* Amount of time in seconds to use as a timeout while trying to write/send data to a search peer.

receiveTimeout = <int, in seconds>
* Amount of time in seconds to use as a timeout while trying to read/receive data from a search peer.

The defaults for these (and other) settings are set in $SPLUNK_HOME/etc/system/default/distsearch.conf.

raziasaduddin
Path Finder

Did you ever solve this?

0 Karma

mslvrstn
Communicator

We are working this case with support. They've said

After some further inquiries with our Dev team, I've learned that the timeout settings in distsearch.conf will not actually have any effect on the problem.
It seems that what is happening is that we are timing out at time, while trying to read the auth token from the peer (Unable to connect to peer uri...) . The httpclient timeouts that affect this behavior are actually hardcoded and NOT configurable.

connectionTimeout = 5;
sendTimeout = 10;
rcvTimeout = 10;

There isn't one setting exposed which you could use to control such timeouts.

Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...