Reporting

How to write a search to report the top VMs out of almost 7000 with an increasing CPU usage trend?

dkoops
Path Finder

I´m looking for a way to determine the slopes of CPU usage over a set timeframe of a large amount of VM´s. I´m able to calculate it for 1 VM but the environment currently contains almost 7000 VM´s.

I would like to have a report showing me the top VM´s with an increasing CPU usage trend.

Tags (5)
0 Karma

dkoops
Path Finder

Thanks for the reply. Unfortunately the suggested search doesn't work.
If I state
| timechart span=1d avg(Value) as yvalue by VMName
It replaces the column name with the specific VM name instead of ´yvalue´ and the following eval functions do not work anymore.

0 Karma

lguinn2
Legend

Would this be better?

index=vcenter_script host=vcenter_statistics Type=VM MetricId=cpu.usage.average
| timechart span=1d avg(Value) as yvalue by VMName
| eventstats count as numevents sum(_time) as sumX sum(yvalue) as sumY sum(eval(_time*yvalue)) as sumXY 
                      sum(eval(_time*_time)) as sumX2 sum(eval(yvalue*yvalue)) as sumY2 by VMName
| eval slope=((numevents*sumXY)-(sumX*sumY))/((numevents*sumX2)-(sumX*sumX))
| eval yintercept= (sumY-(slope*sumX))/numevents
| eval newY=(yintercept + (slope*_time))
| delta newY p=1 as Slp
| stats avg(Slp) as avgSlope by VMName
| sort -avgSlope
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You'd probably want to replace the timechart with this:

...
| bucket span=1d _time
| stats avg(Value) as yvalue by VMName
...

Then the following calculations should work, add a timechart to the end if required.

Note, this will incorrectly ignore days with zero events for a VMName instead of considering them a zero. If that's relevant for your data then there's a bit more work to be done.

0 Karma

dkoops
Path Finder

Well, for one VM is use the macro "lineartrend(2)" 1 for getting a trendline and then use the 'delta' function to get the slope of the trendline. (the macro already generates a 'slope'-field but that's not the value I want)

Currently I actually do have a function to calculate it for all VM's; having the 'map'-function repeat the macro for all VM's but as you can imagine it's really inefficient. My current query takes about 16 hours to complete, but i'm sure someone knows a more effiecient way..

Query I use:
host=vcenter_platform Type=VM
| dedup VMName | table VMName
| map maxsearches=9999 search="search index=vcenter_script host=vcenter_statistics Type=VM VMName=$VMName$ MetricId=cpu.usage.average
| timechart span=1d avg(Value) as yvalue
| lineartrend(_time,yvalue)
| delta newY p=1 as Slp
| stats avg(Slp) as $VMName$"
| addtotals col=t row=f labelfield=total
| search total=Total
| fields - total
| transpose

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Do post how you calculate that for one VM.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...