Splunk Search

Why are we getting error "Timed out waiting for peer XXX", but the search status=success?

secfrit
Explorer

To monitor if my nightly searches ran properly I'm looking at:

index=_internal sourcetype=scheduler earliest=@d | <few_more_filtering>

but I've just noticed that in case of a receiveTimeout error for one of the involved peers, the "status" field in the resulting events contains the value "success", even if opening the search results from the job list I can see an error:

Timed out waiting for peer XXX. If this occurs frequently, receiveTimeout in distsearch.conf may need to be increased. Search results might be incomplete!

I tried to run a global search like:

 splunk_server=* index=* "Timed out waiting for peer"

But nothing is popping up.

Is there a way to set up an alert in case a search ran, but failed or had any issues? The "status" field doesn't seem to cover the latter scenario...

0 Karma

swmishra_splunk
Splunk Employee
Splunk Employee

This error occurs when your Search Heads attempts to send a search job to a Search Peer (usually one of your Indexers) and the Indexer does not respond in within the default timeout period so the Search continues but without using that Indexer (which of course probably means that some of your events are not returned so your search is wrong). In my experience, the problem can often be cleared simply by restarting the Splunk instance on the Indexer in question but sometimes you need to dig deeper. In any case, something is keeping your Indexers so busy that it cannot reliably respond to search requests even though the Splunk instance is running. I am sure this kind of thing can also commonly be caused by misconfigured/misbehaving load-balancers or other identity/load-shifting equipment that is between your Search Head and your Indexer peers.

secfrit
Explorer

As a workaround I'm now checking the messages.error field from the API (i.e. /services/search/jobs)... those messages are available there.

I still think the status field from the scheduler events log should be set to something different than success if actually something happened 😉

0 Karma

meenal901
Communicator

index=* will not give you results from the _internal index. Try:

index=_internal splunk_server=* "Timed out waiting for peer"
0 Karma

secfrit
Explorer

Yeah I forgot to say I've already tried with index=_* too but nothing there neither.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...