Splunk Search

How to alert user when the Processor Time exceeds a certain limit for a given certain time

tusharsappal
Explorer

Hello ,
I want to check for whether my processor has exceeded a certain % for a certain given time and then I want to send an alert .
I have the search query in this format . Kindly guide if possible please correct the query

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" | timechart span="1s" avg(Value) | search count >10 by host

But I am not sure where to check for exceeding a certain percentage of value in this

0 Karma
1 Solution

lguinn2
Legend

Note that the perfmon data is not collected every second, so a span of "1s" in the timechart does not make sense.

What if you had a search that ran every 10 minutes and alerted for any host that had an average processor time in excess of 90%?

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" earliest=-13m latest=-3m
| stats avg(Value)  as AvgProcessorTime by host
| where AvgProcessorTime > 90

You can set the alert condition for # results > 0, as this will return one result for each host that has a high AvgProcessorTime. Note that there is a 3 minute "lag" in my search; this is to allow time for the data to be collected and returned across the environment. You can eliminate this if you want, but it will affect the accuracy.

View solution in original post

lguinn2
Legend

Note that the perfmon data is not collected every second, so a span of "1s" in the timechart does not make sense.

What if you had a search that ran every 10 minutes and alerted for any host that had an average processor time in excess of 90%?

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" earliest=-13m latest=-3m
| stats avg(Value)  as AvgProcessorTime by host
| where AvgProcessorTime > 90

You can set the alert condition for # results > 0, as this will return one result for each host that has a high AvgProcessorTime. Note that there is a 3 minute "lag" in my search; this is to allow time for the data to be collected and returned across the environment. You can eliminate this if you want, but it will affect the accuracy.

saurabh_tek
Communicator

@Iguinn this query is precise for the sudden CPU spike detection but from impact perspective, business is more interested to look at - if CPU reaches beyond threshold and stays there for 3 or more minutes which might impact server performance. any pointer for making it like that ?

rahulkumarfgf
Explorer

Hi @saurabh_tek

I am trying to find a solution to the same problem as mentioned by you. I hope you were able to resolve it. If so, could you please let me know how to handle this? There are several threads with similar questions but none of it actually worked.

Thanks!

0 Karma

tusharsappal
Explorer

Thanks for the response and the query actually did worked well. I had one more query in Mind till now I only know that Splunk only sends the count of the events happened during the time duration , is there any way we can send the actual matching content in the email whenever the alert is fired ,i.e can we make the reporting more intuitive and clear ,sending the actual matching text in the email body [not in the case of perfmon data but in the case of parsing logs ]

Thanks in Advance
Tushar

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...