Splunk Search

Monitor search peers

jec013
Explorer

I have 2 servers, Splunk1 and Splunk2, setup as search peers. How can I monitor when one of the servers goes down or stops responding using Splunk? I have received messages like the following:

-- Search generated the following messages --
Message Level: ERROR
1. Reading error while waiting for peer SPLUNK2. Search results might be incomplete!

I would like to be alerted when something like this happens. Does anyone have any ideas?

Tags (2)
0 Karma

JimDeich
Path Finder

Back in Version 3, on the main search screen, you would see a not "x of y" servers . For example, "5 of 5" Servers. If one was not responding, you could pull down a tab and immediately see which one.

This was a good idea, and meant your users would immediately see any issue. I would like to suggest seeing it come back.

0 Karma

yannK
Splunk Employee
Splunk Employee

Here are 2 methods to detect if search peer is down, or hasn't responded to a search.

  • Schedule a search and count the number of peer responding

Pick a search that should always return results, and count the number of search-peers,
Then setup an email alert based on the number of search-peers (including the search head)

Schedule the search every 5 minutes over last 2hours, and use the alert condition :
if number of events is less than X

index=_internal splunk_server=* | stats count by splunk_server

  • Schedule a search looking in the logs for errors

This is to detect an failure in a search afterward.
By example schedule this search to run every 5 minutes over the last 5 minutes.

index=_internal source=*splunkd.log "Unable to connect to peer"

One remark, a search peer may not respond because of long searches that are hitting the timeout settings, you can increase them if its the case.
see : connectionTimeout, sendTimeout, receiveTimeout in distsearch.conf
http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...