Alerting

How to create consistent monitoring and alerts based on high response time of my landing page?

shashank_24
Path Finder

Hi, I need to create some monitoring and alerts based on high response time of my landing page. The thing is there are always some blips so I want to rule that out and only trigger notifications when there is a consistently high response time for a period of time say 20 mins or 30 mins.

How can a write a query like that? I have written a very generic query which gives me the average and 90th percentile response time of every 5 mins like below but I want to trigger the alert only when there is consistently high response times.

Let me know if anyone has any suggestions.

 

index=myapp_prod sourcetype=ssl_access_combined requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000 
| timechart span=5m avg(responseTime) as AverageResponseTime p90(responseTime) as 90thPercentile

 

As an example - let's say I want to run the alert every 30 mins and check the condition if there are consistently high response times in last 30 mins or 1 hour, then trigger the alert to send out notifications.

Any help is appreciated.

Best Regards,
Sha

Labels (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shashank_24,

viewing your search I can suppose that you have responseTime expressed in microseconds, is it correct?

Anyway, I don't understand if your alert is calculated on an average value or on a peak value.

Anyway, if on an average value, you could run something like this:

index=myapp_prod sourcetype=ssl_access_combined earliest=-30m@m latest=@m requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000 
| stats avg(responsetime) AS avg_responseTime
| where avg_responseTime>60*30

If instead you want a peak value:

index=myapp_prod sourcetype=ssl_access_combined earliest=-30m@m latest=@m requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000 
| where avg_responseTime>60*30

In both cases the trigger condition is that there are results.

if you want 60 minutes instead 30, in the last row replace tha last number.

Ciao.

Giuseppe

0 Karma

shashank_24
Path Finder

Hi @gcusello Thanks for your response. The query which you shared is the one which I also have. So it looks for an average response time over 30 mins and then send the alert BUT what I am interested in is look for every 5 mins over a period of last 30 mins and then see if there is a consistent high response time for every 5 mins, then trigger it. This way we will rule out the outliers or small blips. 

Because what happens is let's say if we have a small blip over 5 mins and response time comes back to normal after than, in that case I don't want to get notified because the issue got resolved by itself but if there is a consistency then there may be an actual issue and our Ops team needs to be notified.

Hope I am making sense. And yes the time is in micro seconds.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shashank_24,

you approach is correct:

you could create two alerts:

  • one every 5 minutes,
  • one every 30 minutes,

obviously the threshold must be different otherwise the second one isn't useful.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...

Get ready to show some Splunk Certification swagger at .conf24!

Dive into the deep end of data by earning a Splunk Certification at .conf24. We're enticing you again this ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Now On-Demand Join us to learn more about how you can leverage Service Level Objectives (SLOs) and the new ...