Alerting

Visibly taking responsibility for a generated alert

SharplyUnclear
Engager

I am working on a call centre solution where alerts are raised (dropped calls, email queues building up, average call length too long, etc.) and displayed in a panel on a common Splunk application to a set of team leaders. When the problem goes away, then the alert status goes 'green' (and it should disappear from the display panel).

I want a team leader to be able to say that they're taking responsibility for the alert, so that no-one else has to concern themselves with it, and for this information to be propagated to all users.

I would expect there to be 5-20 alerts active at any one time (in theory there could be a few hundred, but this would represent Armageddon). What approach would people take to designing this solution - is it practical (say) to hold the alert information in a transient CSV file, and to capture an owner's decision to take responsibility for fixing the problem from an individual screen? Could I use inputcsv and outputcsv to control this mechanism, and would the status be propagated consistently across the system?

Tags (2)

SharplyUnclear
Engager

Thanks for your feedback and for your broad confirmation of the direction I'm taking. We're not going to implement a "poor man's" database transactional model, so there is a small chance that two people respond at the same time. I'll also make sure that only one instance of a particular alert is displayed on the bespoke panel we're controlling output to.

I'll update this note with information on how I get on later on.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

In a traditional IT role, this is a good case for a partner to Splunk like PagerDuty (www.pagerduty.com). The pre-built integrations in Splunk hands off alerts to PagerDuty as incidents, and PagerDuty maintains the responsible party (and their responsiveness). PagerDuty also handles escalations in the event of un-responsiveness.

But I think you would struggle with using PagerDuty for this role in the system you've described. If you're going to have to maintain state, I think what you're describing sounds reasonable - lookups for state are a common solution. I think one potential issue is if you have mulitple instances of a given alert - which one is someone acknowledging / taking responsibility for?

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...