Currently we have 6 host sharing approx. 16.7-16.9% of load. When load is below 11% on a particular host an alert need to be triggered as well as when host is unavailable or not reachable.
When using top limit and alert criteria of when number of results is < 6 I am receiving wrong alerts triggered.
Need some guidance.
hello there,
assuming count of events is the metric you are calculating,
below is a search that answers your question:
otherwise, you can use the same idea to capture the metric you are working with (maybe disk growth or other parameter)
| tstats count as event_count where index=* by splunk_server
| eventstats sum(event_count) as events
| eval percent = round(event_count/events*100, 2)
now you can save as alert and alert if < 11
or add to search a where clause
| where percent < 11
i assume that by saying host, you refer to a splunk indexer.
if that is true, there are plenty of ways to find out an indexer down.
most likely, you will see it in a message, but if you want an alert, you can either capture the events on _internal index, or you can do something quick and dirty like:
| tstats dc(splunk_server) as indexers_up
or
| tstats latest(_time) as last_seen bysplunk_server
| eval last_seen = strftime(last_seen, "%c")
if you have less then 6 in a given time period, you probably want to check, obviously, you can create a search that will tell you which one is "missing" but considering you have 6 indexers, finding it will be quick and easy
hope it helps