We use summary indexing to improve search performance and to avoid unnecessary lookups and field extractions. It is supposed to run every 5 minutes and summarize the previous 5 minute window.
We schedule the saved search values:
<code>earliest = -10m@m latest = -5m@m frequency = every 5 minutes </code>
index="_internal" sourcetype="scheduler"it becomes apparent that the scheduler is not firing our saves searches reliably every 5 minutes. Sometimes a search will only start 6 or 7 minutes after the previous search. This creates small gaps in the data (of 1 or 2 minutes) that is impossible to backfill with the backfill script provided. Also, it renders the summary index useless.
Is there a way to snap to a more accurate 5 minute window? Or a way to force the scheduler to run more reliably?
What's your setting for
realtime_schedule in your
I think in more release release creating a new summary indexing generating scheduled saved search now causes
realtime_schedule to be set to
0. Generally this is what you want since this means that any missed runs get executed later (for example, in the scenario of a
splunkd restart). This also means that these saved searches could be delayed; however, this should not result in gaps in your summary index, this should help prevent them.
If you search your summary index for your summary events in question, you should see that
search_now should always reflect the precise 5 minute interval you have scheduled your searches for, where as
info_search_time will reflect the real (wall clock) time, which is when the search was actually kicked off. So basically, even though your search was delayed by a minute or two (which does seem high), you shouldn't be losing any data because each search should still cover the originally designated window.
You may also want to look into your
limits.conf settings as far as concurrency of saved searches and all that. (I think there are some questions about that flowing around on this site already.)
BTW, are you seeing your saved search show up as being "skipped", because then I would expect to see events being dropped. You can search with:
<code>index="_internal" sourcetype="scheduler" status=skipped </code>
Another thing to consider: Is it possible that you simply don't have any events to summarize for the 5 minute window in question? If this happens, you will see no new events in the summary index (which looks like a "gap"). This may or may not be likely based on your event data, but you should be able to confirm this very quickly with the search:
<code>index="_internal" sourcetype="scheduler" result_count=0 </code>
Of course, if you have some sort of conditional logic, then perhaps this would be a better search:
<code>index="_internal" sourcetype="scheduler" NOT alert_actions="<em>summary_index</em>" </code>