I currently have some alerts being triggered when they shouldn't be. The search is performing a host alive check, where the host is sending an event every second. The search is looking over a period of 1 minute, so it should be returning 60 events. An email will be sent if the event count is < 50. This search is scheduled to run as a cron job, to run every minute
The time frame of the alert is -3@m to -2@m, this was set to make sure there wasn't and issue with searching most recent event.
Randomly the alert will be triggered, but when I view the alert, 60 events are shown in the search. Also when I run the search manually, 60 events are returned.
Looking into this further, I have looked in _audit for events related to the specific search. What I have noticed is that when the alert is triggered, the search results return nothing.
Sample of normal result:
action=search, info=completed, search_id=<SEARCH_ID>, total_run_time=0.46, event_count=60, result_count=60, avaliable_count=60, scan_count=60, drop_count=60
When an alert is triggered:
action=search, info=completed, search_id=<SEARCH_ID>, total_run_time=0.45, event_count=0, result_count=0, avaliable_count=0, scan_count=0, drop_count=0
I can't work out why the search is returning 0 results. To me it appears as if the search didn't run or was unable to run correctly.
If the alert return a result 0 then i need to rerun the alert.
Can anybody tell me how to do it ?
I have added this to the savedsearch and the delay is 0, there are breif periods when it is 1 second.
Use below query, to find out what was the result count when the search was executed. There are result_count and fired field which can give more insight.
index=_internal source=*scheduler.log savedsearch_name="Name of your saved seach"
I run the above search with the savedsearch, on a period when the Alert was triggered. The result count was 0, when I kicked the search manually within the same timeframe, I got a return of 60 events.
I will give that a shot and see what the result is, although I attempted to account for an index delay by using a time range of -10@m to -9@m. Alerts were still being triggered, with the audit logs showing the same result.
Check if there is any delay in indexing the events ,which means when the search ran, the events were not there and the alert fired. By the time you checked manually the events might have arrived.
You can do as below to find the lag in your events
| eval delay = _indextime - _time