Hello,
I want to check if a process is still running. The process is logging periodically a short info on polling a directory.
Now, I will to use that info to detect if the process is not running anymore.
Normally, a simple search for the info would be:
search myprocess is doing the work
and then defining the alert for eventcount = 0 would do the job. But...
I would like to do it in a generic way, that means, that "is doing the work" is actually unknown. This has to work with every process which is logging something periodically.
Now, I have a defined a search doing a simple event count over time periods (hourly):
source=/logs/processes.log earliest=-3h@h | chart count over process by _time span=1h
this give me a table like this:
process 1381651200 1381654800 1381658400
myproc1 12 12 1
myproc2 15 15 0
myproc3 233 243 102
well, that means that the process "myproc2" did not log anything in the last hour and I need an alert for that issue.
What I've tried is to extract somehow the last column and evaluate the apopriate process name (process column) but I couldn't get it done.
Any Ideas hot to do that?
Maybe I'm thinking too complicated?
Regards,
Peter
after hour of reading several postings here finally I found a solution for this problem.
my problem is nothing else as determining the gap between the last log info and now. Well, this search does it for me now:
source=/logs/processes.log earliest=-2h | stats max(_time) As LatestTime by process | eval Gap=(now()-LatestTime) | search Gap>3600
means: every process what was "seen" in the last 2 hours must be "seen" during the last hour also. If not I can raise an alert now!