Hi,
I have created a query to fetch the status of some jobs in a particular format.
There are different scheduled jobs which run in the environment which we want to monitor, some of these runs once daily and some of the runs every 5 or 10 or 30 mins etc.
Query-
index=tomcat source ="/files0/nlhyp*" [Job] cronjob earliest=@d Action=Starting | table CronJobName _time | rename _time as time1 | eval Actual_Start_Time=strftime(time1,"%d-%m-%Y %H:%M:%S") | Join
[
search index=tomcat source ="/files0/nlhyp*" [Job] cronjob earliest=@d Action=Finished | table CronJobName _time | rename _time as time2 | eval Actual_End_Time=strftime(time2,"%d-%m-%Y %H:%M:%S")
] | table CronJobName Actual_Start_Time Actual_End_Time|dedup CronJobName
| join CronJobName type=inner[|inputlookup CronJobLookup.csv]
| dedup CronJobName | table CronJobName Job_Frequency_min Actual_Start_Time Expected_Start_Time Actual_End_Time Expected_End_Time
| eval epoch_a=now() | eval CurrentDate=strftime(now(),"%d-%m-%Y")
| eval epoch_b=strptime(CurrentDate." ".Expected_Start_Time,"%d-%m-%Y %H:%M:%S"), ExpectedStart=strftime(epoch_b,"%d-%m-%Y %H:%M:%S"), CurrentTime=strftime(epoch_a,"%d-%m-%Y %H:%M:%S")
| eval epoch_c=strptime(CurrentDate." ".Expected_End_Time,"%d-%m-%Y %H:%M:%S"), ExpectedEnd=strftime(epoch_c,"%d-%m-%Y %H:%M:%S"), CurrentTime=strftime(epoch_a,"%d-%m-%Y %H:%M:%S")
| table CronJobName Job_Frequency_min Actual_Start_Time ExpectedStart Actual_End_Time ExpectedEnd
| eval Expected_Start=strftime(Expected_Start_Time, "%H:%M:%S") | eval Expected_End=strftime(Expected_End_Time, "%H:%M:%S")
| eval Status = case(
Actual_End_Time > ExpectedEnd AND isnotnull(Actual_End_Time), "Over Run",
Actual_End_Time < ExpectedEnd AND isnotnull(Actual_End_Time), "OK",
isnull(Actual_Start_Time), "Not Run")
Issue 1:
currently for the jobs with frequency as Once, i am getting the result as :
CronJobName Job_Frequency_min Actual_Start_Time ExpectedStart Actual_End_Time ExpectedEnd Status
staffAuditFeedJob Once 21-11-2017 06:30:01 21-11-2017 05:30:00 21-11-2017 06:30:02 21-11-2017 06:30:00 Over Run
here the difference between expected and actual end time is only 2 secs yet the status says Over Run, so i want to give some lead time for 5 mins. like if the job finishes after 5 mins i should get the over run status if it finishes within 5 mins over the expectedend time then it should show as OK.
Issue 2:
For the jobs with frequency other than 1, say the frequency is every 10 min so the expected start and expected end times should vary for each run. Or the status should be more accurate based on some calculations of the time and frequency of the job, which I am not able to get. Can you please suggest?
... View more