I have a current alert that is working as expected to capture a log event that states a service is down. We have started to receive many false positives on this because the service automatically recovers in a matter of seconds. I would like to change the alert so that instead of immediately sending a notification, it will pause for 30 seconds and search for a recovery event and only send the notification if that recovery is not found.
edit:
index=networklogs host=foo10* OR host=foo11* AND ("member" AND "monitor status down")
|rex "monitor status\s+(?<State>\w+)"
|rex "member /Common/(?<trpHost>[^:]+):53"
|eval Identifier=trpHost + "dropped out of the VIP pool"
|eval Summary="Critical Infrastructure - Server dropped out of the VIP pool. Pool member is " + State + "."
|eval ProcessID="foo"
|eval Severity=if(
State=="down",
5,
1
)
| eval Type=if(State=="down",
1,
2
)
|eval OwnerGID=1000
|eval ForceUpdateFields="Severity,Type,Summary"
|eval Submitter="foo"
|eval LOB="IP"
|eval AlertGroup="VIP Member Dropped out"
|eval Agent="rdns"
... View more