Alerting

What is the best approach to Windows Event Log alerting?

richnavis
Contributor

I would like to know if anyone is using Splunk as the primary alerting engine for Windows Event Logs. We several hundred servers that index the Windows Event Logs or each server, but we currently have no alerting setup for concerning events. I can envision that eventually we may want to have several hundred "alert conditions", but am unsure of the best way to do this. My thought is that we setup a scheduled search every x minutes (5 perhaps?) where the match condition for the search is a lookup table of event IDs. Then the output of the lookup table would include a field that includes some description of what the problem may be. Is anyone doing anything similar? Does anyone have any other ideas on how to use Splunk as an effective Alerting tool, not just an awesome serach engine for errors?

0 Karma

DalJeanis
Legend

Yes, there are lots of us that drive alerting off of splunk, or through splunk to another application. But your idea for an overall architecture is probably too much of a one-size-fits-all. Just because there exists an event -- let's say a password failure -- doesn't mean that there is anything wrong.

Not all alerts are created equal. Think of an alert as a report that you want someone to receive whose purpose is to draw their attention to something. The urgency of the "something" -- and also the specificity of the "something" as being out of the normal range --will determine the frequency you need to alert on.

1) Tell me within a day if the previous day we reached over 80% disk usage on any machines in group XYZ.
2) Tell me within X hours (or YY minutes) if a host becomes unavailable or stops reporting.
3) Tell me monthly if any user goes over X hours of working from home , or if any user visits certain types of websites more than X times in the 30 day period.
4) Tell me within X minutes (or YY seconds) if the number of event X per second goes outside of normal bounds.

Sometimes there will be events that are probably just normal goofs, but over the medium term you have to check for patterns to make sure it's not some kind of attack (whether outsider or insider). Not this second or minute, but also not a month from now.

@adonio's suggestions are right-on. You will build up your knowledge base and your arsenal of alerts over time, creating them and adjusting them and retiring them over time as your business cases dictate and what is "normal" for your organization changes.

Stick to the "Agile" way of thinking for now. Implement early and often, and adjust and reorient as you learn. Don't worry about getting it perfect, done is better than perfect, and splunk is easy enough to modify as you go along.

0 Karma

adonio
Ultra Champion

hello rnavis,
first build a search that matches the KPI, SLA, threshold you want to be alerted on. then setup an alert by saving as alert.
there are almost 100 pre built reports and searches in the App for Windows Infrastructure, it can be a good place to start looking for references searches and reports. more on this app, here: https://splunkbase.splunk.com/app/1680/

p.s. there are many other windows data related apps in splunkbase. download and explore searches

hope it helps

0 Karma
Get Updates on the Splunk Community!

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...