Dashboards & Visualizations

Dynamic Alerting based on rules from excel file

poddraj
Explorer

I was given below excel of rules

FT          OOS     ErrorCode   Priority      Min_req   Max_Req Min_Req_Fail    Max_Req_Fail    Failure%    watch   
ALCATEL_FT  ALCATEL   ALLERRORS     1       20          300         20              40          0       3                                   
ALCATEL_FT  ALCATEL   ALLERRORS     2       20          300         41              75          0       2                                   
ALCATEL_FT  ALCATEL   ALLERRORS     3       20          300         76               0          0       1                                   
ALCATEL_FT  ALCATEL   ALLERRORS     4       20          0            301               0            0       1                                   
ALCATEL_FT  ALCATEL   ALLERRORS     5       20          0            0               0          100     2                                   
GWR_FT      GWRNG_NY      ALLERRORS     1       20          300         20              60          0       3                                   
GWR_FT      GWRNG_NY      ALLERRORS     2       20          300         61              100         0       2                                   
GWR_FT      GWRNG_NY      ALLERRORS     3       20          300         101             0           0       1                                   
GWR_FT      GWRNG_NY      ALLERRORS     4       20          0           201             0           0       1                                   
GWR_FT      GWRNG_NY      ALLERRORS     5       20          0           0               0           90      2   

I have the log file in splunk saved into index dte4fios and I have the fields FT,Error_Code & OSS which are same as above lookup file columns FT,ErrorCode & OSS.

I was given below requirement to create a very dynamic alert by using the rules in the above lookup file.
1. My Alert should run every 15 mins (lete us assume alert is running at 11AM)and check for if there is any any FT & Error Code combination which has satisfied above rules and if yes then send an alert with output of some other query.
2. Let me explain the understanding of rules in above lookup file for one of the FT ALCATEL_FT in this case. Rules should be validated in order of Prioirty 5 to Proirity 1
Prioirty 5 Rule:
Min_req - If in last 15 mins if this FT has got min hits of 20
Max_Req - Max can be anything (0 to denote as max can be anything)
Failure% - If Failure% is 100
watch - 2 means that I need to check whether in my previous 15min interval (i.e. 10:30-10:45) also this FT is satisfying the Rule 5 or not. If yes then only send the alert otherwise not.

Prioirty 4 Rule: 
Min_req - If in last 15 mins FT has got min hits of 20
    Max_Req -  Max can be anything (0 to denote as max can be anything)
Min_Req_Fail - If Min_Req_Fail is >=301
watch - 1 means that I need not check my prior 15min interval and just send the alert

Prioirty 3 Rule: 
Min_req - If in last 15 mins FT has got min hits of 20
Max_Req -  If in last 15 mins FT has got max hits of <=300
Min_Req_Fail - If Min_Req_Fail is >=76
watch - 1 means that I need not check my prior 15min interval and just send the alert

Prioirty 2 Rule: 
Min_req - If in last 15 mins FT has got min hits of 20
    Max_Req -  If in last 15 mins FT has got max hits of <=300
Min_Req_Fail - If Min_Req_Fail is >=41
Min_Req_Fail - If Min_Req_Fail is <=75
watch - 2 means that I need to check whether in my previous 15min interval (i.e. 10:30-10:45) also this FT is satisfying the Rule 2 or not. If yes then only send the alert otherwise not.

Prioirty 1 Rule: 
Min_req - If in last 15 mins FT has got min hits of 20
    Max_Req -  If in last 15 mins FT has got max hits of <=300
Min_Req_Fail - If Min_Req_Fail is >=20
Min_Req_Fail - If Min_Req_Fail is <=40
watch - 3 means that I need to check whether in my previous 2 15min interval (i.e. 10:30-10:45, 10:15-10:30) also this FT is satisfying the Rule 1 or not. If yes then only send the alert otherwise not.

This is how I need to comapre my log with the rules in the lookup table and generate an alert. Above lookup table is a sample one but it has more rows with diff FT, Error_Code & rules.
I am pretty new to splunk and will be a real help if someone can guide me in writing query to achieve this for 1 FT so that I can simulate for entire lookup table.

index=dte_fios sourcetype=dte2_Fios FT=*FT earliest=04/20/2020:11:00:00 latest=04/20/2020:13:00:00
| stats count as Total, count(eval(Error_Code!="0000")) AS Failure by FT
| eval Failurepercent=round(Failure/Total*100)
| table FT, Total,Failure,Failurepercent

I need help with how to get the rows for every 15min interval for each FT & ErroCode and then verify the rules
0 Karma

poddraj
Explorer

Can someone tell me if this use case can be implemented using splunk?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...