Alerting

DMC Alert - Search Peer Not Responding; How to make the alert more lenient to reduce false positives?

muebel
SplunkTrust
SplunkTrust

DMC Alert - Search Peer Not Responding is great for getting notifications when a Splunk instance is having issues, but I find that it will fire off false positives throughout the day. My suspicion is that either the Distributed Management Console or its search peers are busy when a status check is initiated, and the check times out.

I'd be interested in increasing this timeout as a way of troubleshooting the issue, but am not quite sure which configuration setting controls it. The alert in question:

| rest splunk_server=local /services/search/distributed/peers/ 
| where status!="Up" 
| fields peerName, status 
| rename peerName as Instance, status as Status

And when I read the distsearch.conf spec:

statusTimeout = <int, in seconds>
* Set connection timeout when gathering a search peer's basic info (/services/server/info).
* Note: Read/write timeouts are automatically set to twice this value.
* Defaults to 10.

My expectation is that increasing the statusTimeout on the DMC will give the searchPeers more slack as the DMC tries to get each Peers info, which in turn will result in less peers showing up as "Down" in /services/search/distributed/peers/

Has anybody done anything along these lines? Is there anything I am missing or should look into more? Thanks for any advice!

hexx
Splunk Employee
Splunk Employee

Ultimately, this depends on the exact nature of the failure that leads the distributed search framework on the DMC to declare the status of some of your peers as "down" intermittently.

I think that statusTimeout is a good guess here if you need to pick one timeout setting to extend, but ultimately it would be better to review the details of the peer failure in splunkd.log to understand what timeout led to declaring the peer down (the search-head should know this) and what was going on on the peer itself.

Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...