I'm indexing logfiles from a custom web application that receives nonstop traffic, resulting in virtually nonstop log messages. Occasionally (unfortunately) the application will hang, resulting in a gap in the logging. I'd like to be alerted if this condition occurs. Any ideas on how to search/alert in realtime for "gaps" in the log file, or an absence of data?
If you just want the immediate alert (no historical reporting), you could do something like:
| metadata type=sources | search source=yoursource | eval age=now()-lastTime | search age>60
If it returns any results, your source has not reported in a longer-than-expected amount of time (60 seconds in this case).
Note: It's not clear to me whether it's preferable to use lastTime
or RecentTime
in the eval
statement. Maybe someone else can explain the difference.
So how does the search change to test for a count of at least 1 record? I'm new to Splunk and not sure what to add to the search noted above.
If you just want the immediate alert (no historical reporting), you could do something like:
| metadata type=sources | search source=yoursource | eval age=now()-lastTime | search age>60
If it returns any results, your source has not reported in a longer-than-expected amount of time (60 seconds in this case).
Note: It's not clear to me whether it's preferable to use lastTime
or RecentTime
in the eval
statement. Maybe someone else can explain the difference.
Perhaps not. Ideally I'd be able to identify 5-10 second gaps in the logs, which indicates the "hanging" condition. I'll investigate your hypothesis, thanks.
My guess (and only a guess) is that events are streaming, but the metadata only gets updated every 30 seconds. Is that really a problem though? How often do you want to run the saved search?
Unfortunately I was too quick to mark this as solved. It seems that the "age" always grows to 30s and resets to 0s, indicating that the remote Splunk servers that feed this index are only sending data every 30 seconds, even though the logs are constantly written -- does that sound plausible? If so, how do I instruct the Splunk forwarders to stream data in realtime rather than batching every 30 seconds?
lastTime is the time value of the last event received's timestamp.
recentTime is the last time (on the index server) that an event was received.
Elegant. Thank you very much.
Our solution for this was to run a saved search every n minutes that searches back n minutes for everything. If the results count is less than 1 we send an email. Although not real time, we use n = 5 which fits our needs.
This was my initial idea as well, but I figured there was a better way. This is a good way to look back at historical data, though, which also comes in handy