Alerting

Need a Splunk alert that fires when cpu % mem_use% OR disk use % >75% (while also indicating the top offending processes)

spluzer
Communicator

Hello Splunkers. Noob here. I have an alert that fires when any three metrics (listed in title) goes above 75%. I just need to add into the alert what top offending processes are causing the overages. Here is my query so far, which does work to illustrate when either 3 main metrics goes above 75%.

index=blah (sourcetype="PerfDisk" OR sourcetype="PerfCPU" OR sourcetype="PerfMem" OR sourcetype="PerfProcess") (host=blah OR host=blah OR host=blah OR host=blah ) earliest=-5m
| stats avg(%CommittedBytes) as mem_use_prcnt
avg(cpuLoadPerc) as cpu_load_prcnt
avg(%DiskTime) as disk_utilization_prcnt
by host
| eval fire_it_up = case(cpu_load_prcnt > 75,1,
mem_use_prcnt > 75,1,
disk_utilization_prcnt > 75,1,
true() ,0)
| where fire_it_up > 0
| table all three metrics

Any ideas on getting the top offending processes causing the overages???. Any help is much appreciated.

0 Karma
1 Solution

spluzer
Communicator

Here is what I ended up doing:

index=win sourcetype="Perf:logDisk" instance!=_Total (host=myhost) earliest=-5m
| eval volume = instance
| stats avg(%_Disk_Time) as diskUse% by volume host
| join type=left host
[| search index=win sourcetype="Perf:Process" category=%_Processor_Time=* NOT(instance IN(_Total, Idle)) (host=myhost) earliest=-5m
| stats avg(%_Processor_Time) as %_Processor_Time by host instance
| sort -%_Processor_Time
| streamstats count by host
| where count=1
| eval %_Processor_Time=round('%_Processor_Time')
| eval Additional_InfoCPU = "Top Resource Task=" . instance . ", Task Time=" . '%_Processor_Time'
| fields host Additional_InfoCPU ]

then repeated that for a bunch of other metrics (mem%, cpu% etc etc) in separate subsearches

Then

| eval mem_use_% = round(mem_use_%, 2)
| eval cpu_load_% = round(cpu_load_%, 2)
| eval disk_utilization_% = round(disk_utilization_%, 2)
| eval Individual_DiskUse_%t = round(Individual_DiskUse_%, 2)
| eval fire_alert = case(cpu_load_% > 75,1,
mem_use_%> 75,1,
Individual_DiskUse_%> 75,1,
true() ,0)
| where fire_Alert>0
| stats values(volume) values(DiskUse_%) by everything you want

| Table it all out

View solution in original post

0 Karma

Sukisen1981
Champion

hi @spluzer
I just need to add into the alert what top offending processes are causing the overages...well then you need to capture the process names under cpu,memory or disk . I am sure its mentiioned in your events somewhere?
you just cant go by sourcetype , all that would mean is if cpu spikes >75% we know its the PerfCPU sourcetype.
Perhaps you have more granular details than that, like under that source types which are the cpu process names?

0 Karma

spluzer
Communicator

Here is what I ended up doing:

index=win sourcetype="Perf:logDisk" instance!=_Total (host=myhost) earliest=-5m
| eval volume = instance
| stats avg(%_Disk_Time) as diskUse% by volume host
| join type=left host
[| search index=win sourcetype="Perf:Process" category=%_Processor_Time=* NOT(instance IN(_Total, Idle)) (host=myhost) earliest=-5m
| stats avg(%_Processor_Time) as %_Processor_Time by host instance
| sort -%_Processor_Time
| streamstats count by host
| where count=1
| eval %_Processor_Time=round('%_Processor_Time')
| eval Additional_InfoCPU = "Top Resource Task=" . instance . ", Task Time=" . '%_Processor_Time'
| fields host Additional_InfoCPU ]

then repeated that for a bunch of other metrics (mem%, cpu% etc etc) in separate subsearches

Then

| eval mem_use_% = round(mem_use_%, 2)
| eval cpu_load_% = round(cpu_load_%, 2)
| eval disk_utilization_% = round(disk_utilization_%, 2)
| eval Individual_DiskUse_%t = round(Individual_DiskUse_%, 2)
| eval fire_alert = case(cpu_load_% > 75,1,
mem_use_%> 75,1,
Individual_DiskUse_%> 75,1,
true() ,0)
| where fire_Alert>0
| stats values(volume) values(DiskUse_%) by everything you want

| Table it all out

0 Karma

richgalloway
SplunkTrust
SplunkTrust

@spluzer If your problem is resolved, please accept the answer to help future readers.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...