Alerting

How to Create an Alert which Alerts when Response Time > 5 Seconds

skoelpin
SplunkTrust
SplunkTrust

I'm using ...| transction to group together a web service request and response. I'm then finding the avg(duration) from the response and request. This works successfully and groups the request and response into one event then it adds a new field called duration and shows the response time.

I want to find all events that have a response time greater than 5 seconds. Most of the events are in the 1 second range, but there are a few that are above 5 seconds. Whenever I add the ...| where duration > 5 at the end, it's still showing me events that have a duration of less than 5 seconds. Nothing is being excluded, and I'm getting the same number of events back as I did without the where clause

Here's my search

index=unleashed Call="<CreateOrder*" | transaction GUID startswith="fterReceiveRequest" endswith="BeforeSendReply" | timechart avg(duration)  | WHERE duration>5
Tags (2)
0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

I'm taking back my comment Rich...

Why would you run transaction into a timechart? I've tried to come up with several examples but they all fail logic.

My first attempt:  (again we loose duration after the transaction)

although "duration" will be available after transaction so the timechart will fail.  The reason being is transaction is taking all the events where GUID is common and combining into one large event.  At that point the original key values are mostly gone except for host, source, sourcetype, index, etc.   

index=unleashed Call="<CreateOrder*" duration>5 | stats avg(duration) AS duration by GUID | transaction GUID startswith="fterReceiveRequest" endswith="BeforeSendReply" | timechart avg(duration)

Note that "average duration" will not be a true average because you've eliminated durations of 5 or less.

So here's my 2nd attempt which I thought might work:

index=unleashed Call="<CreateOrder*" duration>5  | stats avg(duration) AS duration by GUID 

But now I truly see the issue... they're having issue matching duration to GUID because the GUID spans across all the events but the duration doesnt.

So now I feel like this might do it but we probably need more info:

     index=unleashed Call="<CreateOrder*" duration=* | timechart avg(duration) AS duration by GUID  | where duration>5 

Or simplified:


     index=unleashed Call="<CreateOrder*" duration>5 | timechart avg(duration) AS duration by GUID  

View solution in original post

jkat54
SplunkTrust
SplunkTrust

I'm taking back my comment Rich...

Why would you run transaction into a timechart? I've tried to come up with several examples but they all fail logic.

My first attempt:  (again we loose duration after the transaction)

although "duration" will be available after transaction so the timechart will fail.  The reason being is transaction is taking all the events where GUID is common and combining into one large event.  At that point the original key values are mostly gone except for host, source, sourcetype, index, etc.   

index=unleashed Call="<CreateOrder*" duration>5 | stats avg(duration) AS duration by GUID | transaction GUID startswith="fterReceiveRequest" endswith="BeforeSendReply" | timechart avg(duration)

Note that "average duration" will not be a true average because you've eliminated durations of 5 or less.

So here's my 2nd attempt which I thought might work:

index=unleashed Call="<CreateOrder*" duration>5  | stats avg(duration) AS duration by GUID 

But now I truly see the issue... they're having issue matching duration to GUID because the GUID spans across all the events but the duration doesnt.

So now I feel like this might do it but we probably need more info:

     index=unleashed Call="<CreateOrder*" duration=* | timechart avg(duration) AS duration by GUID  | where duration>5 

Or simplified:


     index=unleashed Call="<CreateOrder*" duration>5 | timechart avg(duration) AS duration by GUID  

skoelpin
SplunkTrust
SplunkTrust

Ahh rookie move on my end.. It worked when I removed the timechart command. For clarification, I was using the timechart command to chart the average response time for Create Order web service call. I was finding the response time by taking the difference between the response and the request. I'm going to create a separate search for my alert which will not have the timechart command. Thanks for the help!

jkat54
SplunkTrust
SplunkTrust

Was it this one?

   index=unleashed Call="<CreateOrder*" duration>5  | stats avg(duration) AS duration by GUID 
0 Karma

skoelpin
SplunkTrust
SplunkTrust

It was this

index=unleashed Call="<CreateOrder*"  | transaction GUID startswith="fterReceiveRequest" endswith="BeforeSendReply" | where duration>5

richgalloway
SplunkTrust
SplunkTrust

Have you tried putting the where clause before timechart?

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...