Solved: Need a Splunk alert that fires when cpu % mem_use%...

spluzer · ‎08-06-2019

Hello Splunkers. Noob here. I have an alert that fires when any three metrics (listed in title) goes above 75%. I just need to add into the alert what top offending processes are causing the overages. Here is my query so far, which does work to illustrate when either 3 main metrics goes above 75%.

index=blah (sourcetype="PerfDisk" OR sourcetype="PerfCPU" OR sourcetype="PerfMem" OR sourcetype="PerfProcess") (host=blah OR host=blah OR host=blah OR host=blah ) earliest=-5m
| stats avg(%CommittedBytes) as mem_use_prcnt
avg(cpuLoadPerc) as cpu_load_prcnt
avg(%DiskTime) as disk_utilization_prcnt
by host
| eval fire_it_up = case(cpu_load_prcnt > 75,1,
mem_use_prcnt > 75,1,
disk_utilization_prcnt > 75,1,
true() ,0)
| where fire_it_up > 0
| table all three metrics

Any ideas on getting the top offending processes causing the overages???. Any help is much appreciated.

spluzer · ‎08-15-2019

Here is what I ended up doing:

index=win sourcetype="Perf:logDisk" instance!=_Total (host=myhost) earliest=-5m
| eval volume = instance
| stats avg(%_Disk_Time) as diskUse% by volume host
| join type=left host
[| search index=win sourcetype="Perf:Process" category=%_Processor_Time=* NOT(instance IN(_Total, Idle)) (host=myhost) earliest=-5m
| stats avg(%_Processor_Time) as %_Processor_Time by host instance
| sort -%_Processor_Time
| streamstats count by host
| where count=1
| eval %_Processor_Time=round('%_Processor_Time')
| eval Additional_InfoCPU = "Top Resource Task=" . instance . ", Task Time=" . '%_Processor_Time'
| fields host Additional_InfoCPU ]

then repeated that for a bunch of other metrics (mem%, cpu% etc etc) in separate subsearches

Then

| eval mem_use_% = round(mem_use_%, 2)
| eval cpu_load_% = round(cpu_load_%, 2)
| eval disk_utilization_% = round(disk_utilization_%, 2)
| eval Individual_DiskUse_%t = round(Individual_DiskUse_%, 2)
| eval fire_alert = case(cpu_load_% > 75,1,
mem_use_%> 75,1,
Individual_DiskUse_%> 75,1,
true() ,0)
| where fire_Alert>0
| stats values(volume) values(DiskUse_%) by everything you want

| Table it all out

View solution in original post

Sukisen1981 · ‎08-15-2019

hi @spluzer
I just need to add into the alert what top offending processes are causing the overages...well then you need to capture the process names under cpu,memory or disk . I am sure its mentiioned in your events somewhere?
you just cant go by sourcetype , all that would mean is if cpu spikes >75% we know its the PerfCPU sourcetype.
Perhaps you have more granular details than that, like under that source types which are the cpu process names?

spluzer · ‎08-15-2019

Here is what I ended up doing:

index=win sourcetype="Perf:logDisk" instance!=_Total (host=myhost) earliest=-5m
| eval volume = instance
| stats avg(%_Disk_Time) as diskUse% by volume host
| join type=left host
[| search index=win sourcetype="Perf:Process" category=%_Processor_Time=* NOT(instance IN(_Total, Idle)) (host=myhost) earliest=-5m
| stats avg(%_Processor_Time) as %_Processor_Time by host instance
| sort -%_Processor_Time
| streamstats count by host
| where count=1
| eval %_Processor_Time=round('%_Processor_Time')
| eval Additional_InfoCPU = "Top Resource Task=" . instance . ", Task Time=" . '%_Processor_Time'
| fields host Additional_InfoCPU ]

then repeated that for a bunch of other metrics (mem%, cpu% etc etc) in separate subsearches

Then

| eval mem_use_% = round(mem_use_%, 2)
| eval cpu_load_% = round(cpu_load_%, 2)
| eval disk_utilization_% = round(disk_utilization_%, 2)
| eval Individual_DiskUse_%t = round(Individual_DiskUse_%, 2)
| eval fire_alert = case(cpu_load_% > 75,1,
mem_use_%> 75,1,
Individual_DiskUse_%> 75,1,
true() ,0)
| where fire_Alert>0
| stats values(volume) values(DiskUse_%) by everything you want

| Table it all out

richgalloway · ‎08-15-2019

@spluzer If your problem is resolved, please accept the answer to help future readers.

---
If this reply helps you, Karma would be appreciated.

Need a Splunk alert that fires when cpu % mem_use% OR disk use % >75% (while also indicating the top offending processes)

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!