Hello all,
I was trying to get some predictive alerts working, my only problem is the search I've written is limited to a single host, and I'm trying to manage 2300 servers.
This part of the code effectively identifies outliers in CPU usage.
index=perfmon host=<server> counter="% Processor Time"
| timechart span=5min avg(Value)
| predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95
| `forecastviz(4, 2, "avg(Value)", 95)`
| eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)
The following isolates the search to the last 30 minutes
| eval eTime=relative_time(_time, "-0M") | eval lTime=relative_time(now(), "-30M") | where eTime>=lTime
My plan is to schedule this search to run every 30 minutes, and to alert/email when 'isOutlier=1'. This works great if I only have 1 server, or if I want to group them all together as a single object. But does anyone know of a way to apply this with a wildcard, and have it evaluate each host independently of the others?
Try this:
index=perfmon host=* counter="% Processor Time"
| timechart span=5min avg(Value)
| predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95
| `forecastviz(4, 2, "avg(Value)", 95)`
| eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)
| table host, isOutlier | search isOutlier=1
For the trigger conditions, set trigger alert when number of results are greater than 0 and trigger for each result with limited throttling set if you don't want to receive multiple email alerts
That doesn't seem to do the trick either.
I ran this search on 3 host's individually(server1,server2,server3). Then ran it with server* as I originally had it (which averages the data from the 3 servers), and with your modification (line 6).
The host field does return with your modification, but it's a null value. And it only detects the outliers as reflected from the combined average. (I verified this by including: table host, isOutlier, avg(value)) The CPU usage matched the average of the 3 rather than any 1 server's CPU at the time of an outlier.