Splunk Search

How to send a recovery alert only when there is a corresponding alert?

anasar
New Member

Hi,

we have many indexes like server and core. and we have a lookup table having two columns: exception and threshold. So the requirements is: to search for all exceptions in each indexes and if any exception from the lookup table is found in the index and the count is equal to or greater than the threshold value from the lookup, then alert.

After 5 mins, it should search again and if the exceptions are not happening in the index, then send recovery alert only for those events which we already sent an alert. So intention here is send recovery alert only when there was an corresponding alert.

While alerting, we need to specify which exception happened, how many events(should be grater than threshold), source ip, etc.. The recovery alert should also have same info.

Eg

Look up table: exception.csv
exceptions,threshold
OutOfMemoryError,1
ORA-1112, 5
JVMExceptions, 2
etc.   
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi anasar,
if exceptions are in a field they are easier to manage:

your_search [ | inputlookup  exception.csv | fields exceptions ] 
| stats count by exception 
| lookup  exception.csv exceptions OUTPUT threshold 
| where count > threshold 
| table exceptions count threshold 

If instead (as I think you have) exceptions are strings in your events it's less easy!

your_search [ | inputlookup  exception.csv | rename exceptions AS query | fields query ] 
| rename _raw as rawText
| eval foo=[
   | inputlookup exception.csv 
   | eval query="%"+exception+"%" 
   | stats values(query) AS query 
   | eval query=mvjoin(query,",") 
   | fields query 
   | format "" "" "" "" "" ""
   ]
| eval foo=split(foo,",") 
| mvexpand foo 
| where like(rawText,foo)
| lookup  exception.csv exceptions OUTPUT threshold 
| stats count by exceprions
| where count > threshold 
| table exceptions count threshold 

This solve the first requirement but I'm not sure about the second one: "After 5 mins, search again and if the exceptions are not happening in the index, then send recovery alert only for those events we already sent an alert. So intention here is send recovery alert only when there was a corresponding alert."

Bye.
Giuseppe

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi anasar,
if exceptions are in a field they are easier to manage:

your_search [ | inputlookup  exception.csv | fields exceptions ] 
| stats count by exception 
| lookup  exception.csv exceptions OUTPUT threshold 
| where count > threshold 
| table exceptions count threshold 

If instead (as I think you have) exceptions are strings in your events it's less easy!

your_search [ | inputlookup  exception.csv | rename exceptions AS query | fields query ] 
| rename _raw as rawText
| eval foo=[
   | inputlookup exception.csv 
   | eval query="%"+exception+"%" 
   | stats values(query) AS query 
   | eval query=mvjoin(query,",") 
   | fields query 
   | format "" "" "" "" "" ""
   ]
| eval foo=split(foo,",") 
| mvexpand foo 
| where like(rawText,foo)
| lookup  exception.csv exceptions OUTPUT threshold 
| stats count by exceprions
| where count > threshold 
| table exceptions count threshold 

This solve the first requirement but I'm not sure about the second one: "After 5 mins, search again and if the exceptions are not happening in the index, then send recovery alert only for those events we already sent an alert. So intention here is send recovery alert only when there was a corresponding alert."

Bye.
Giuseppe

0 Karma

anasar
New Member

Thank you cusello. It works.

0 Karma

anasar
New Member

Planning to use summary index to save alerts and use it for recovery alerts. But I'm not getting other howto parts clearly. Please help.

0 Karma

anasar
New Member

also I need to add one more column in lookup file. severity. which says whether the exception is a warning(1) or critical(2). Hence while sending mail we need to see the severity and alert subject will be "Critical Problem alert" or "Warning Problem alert ...". The recovery alert "Recovery ...." don't need to refer the severity field. Hence the exception.csv will look like:

exceptions, threshold, severity
OutOfMemoryError, 1, 2
ORA-1114, 5, 1
JVMExceptions, 10, 2
etc.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...