Splunk IT Service Intelligence

How to forecast for multiple hosts individually

clowdmike
New Member

Hello all,

I was trying to get some predictive alerts working, my only problem is the search I've written is limited to a single host, and I'm trying to manage 2300 servers.
This part of the code effectively identifies outliers in CPU usage.

index=perfmon host=<server> counter="% Processor Time" 
| timechart span=5min avg(Value) 
| predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
| `forecastviz(4, 2, "avg(Value)", 95)` 
| eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)

The following isolates the search to the last 30 minutes

| eval eTime=relative_time(_time, "-0M") | eval lTime=relative_time(now(), "-30M") | where eTime>=lTime 

My plan is to schedule this search to run every 30 minutes, and to alert/email when 'isOutlier=1'. This works great if I only have 1 server, or if I want to group them all together as a single object. But does anyone know of a way to apply this with a wildcard, and have it evaluate each host independently of the others?

0 Karma

paranjith
Explorer

Try this:

 index=perfmon host=* counter="% Processor Time" 
 | timechart span=5min avg(Value) 
 | predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
 | `forecastviz(4, 2, "avg(Value)", 95)` 
 | eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)
 | table host, isOutlier | search isOutlier=1

For the trigger conditions, set trigger alert when number of results are greater than 0 and trigger for each result with limited throttling set if you don't want to receive multiple email alerts

0 Karma

clowdmike
New Member

That doesn't seem to do the trick either.

I ran this search on 3 host's individually(server1,server2,server3). Then ran it with server* as I originally had it (which averages the data from the 3 servers), and with your modification (line 6).

The host field does return with your modification, but it's a null value. And it only detects the outliers as reflected from the combined average. (I verified this by including: table host, isOutlier, avg(value)) The CPU usage matched the average of the 3 rather than any 1 server's CPU at the time of an outlier.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...