Hi Every one,
I have configured an alert using cron expression (*/1 * * * *) schedule to run for every one minute. After saving the alert the next scheduled time is updated with current time +1 min. After executing the job after 1 minute, the next scheduled time is supposed to update for the next minute which is not happening. It is working fine until yesterday and suddenly this issue has occurred for all alerts.
Please help to resolve this friends.
You can always use * * * * * to schedule any search to run every minute, but in this case whenever search takes more than 1 minute to run it will start skipping the next run. It always better to keep a safe time interval between the searches.
The results are not skipped, In my point of view the scheduler is hung and the next schedule time is not getting updated for alerts 😞
That is why my results are zero while using the following command "index=_internal sourcetype=scheduler app="postilion*" | timechart count by status"
Next scheduled time stops at this time 2018-08-14 17:00:00 CDT and until now it is not updated Check the splunkd logs no error from my applications
I have alerts on my Monitoring console for skipped searches. I alert to Slack.
I wrote about finding skipped searches: https://answers.splunk.com/answers/514181/skipped-searches-on-shc.html
Paul Lucas gave a great talk at .conf17 on the new scheduler (as well as at our SF Bay area splunk usergroup on August 8): https://conf.splunk.com/files/2017/slides/making-the-most-of-the-splunk-scheduler.pdf The skew feature looks very useful.
Did you try with "* * * * *" (five asterisk) in cron definition?
Yes, I have tired using 5* and "*/1 * * * *" as cron definition
So it was working yesterday and stopped working today? I'm willing to bet your searches are skipping because your hardware can't handle the load. You should look at the internal index to verify this
index=_internal sourcetype=scheduler status="skipped"
Agreed. Running alert every minute can sometimes use as much resources as real-time searches do. Also, what is the search you're using for this alert?
Hi,
I have checked this using the following command "index=_internal sourcetype=scheduler app="postilion*" | timechart count by status" for last 7 days, My results are 0 after 10th August'18 for skipped, continued and success status
Just to make sure we are checking everything, what does your Splunk environment look like? If you're running distributed, are you forwarding your search head logs to your indexers?