Refine your search:

We use summary indexing to improve search performance and to avoid unnecessary lookups and field extractions. It is supposed to run every 5 minutes and summarize the previous 5 minute window.

We schedule the saved search values:

earliest = -10m@m
latest = -5m@m
frequency = every 5 minutes

When investigating

index="_internal" sourcetype="scheduler"
it becomes apparent that the scheduler is not firing our saves searches reliably every 5 minutes. Sometimes a search will only start 6 or 7 minutes after the previous search. This creates small gaps in the data (of 1 or 2 minutes) that is impossible to backfill with the backfill script provided. Also, it renders the summary index useless.

Is there a way to snap to a more accurate 5 minute window? Or a way to force the scheduler to run more reliably?

asked 07 Sep '10, 08:43

stephanbuys's gravatar image

stephanbuys
5062415
accept rate: 8%

edited 07 Sep '10, 19:02

Lowell's gravatar image

Lowell ♦
13.0k3625120


One Answer:

What's your setting for realtime_schedule in your savedsearches.conf entry?

I think in more release release creating a new summary indexing generating scheduled saved search now causes realtime_schedule to be set to 0. Generally this is what you want since this means that any missed runs get executed later (for example, in the scenario of a splunkd restart). This also means that these saved searches could be delayed; however, this should not result in gaps in your summary index, this should help prevent them.

If you search your summary index for your summary events in question, you should see that search_now should always reflect the precise 5 minute interval you have scheduled your searches for, where as info_search_time will reflect the real (wall clock) time, which is when the search was actually kicked off. So basically, even though your search was delayed by a minute or two (which does seem high), you shouldn't be losing any data because each search should still cover the originally designated window.

You may also want to look into your limits.conf settings as far as concurrency of saved searches and all that. (I think there are some questions about that flowing around on this site already.)


BTW, are you seeing your saved search show up as being "skipped", because then I would expect to see events being dropped. You can search with:

index="_internal" sourcetype="scheduler" status=skipped

Another thing to consider: Is it possible that you simply don't have any events to summarize for the 5 minute window in question? If this happens, you will see no new events in the summary index (which looks like a "gap"). This may or may not be likely based on your event data, but you should be able to confirm this very quickly with the search:

index="_internal" sourcetype="scheduler" result_count=0

Of course, if you have some sort of conditional logic, then perhaps this would be a better search:

index="_internal" sourcetype="scheduler" NOT alert_actions="*summary_index*"
link

answered 07 Sep '10, 18:57

Lowell's gravatar image

Lowell ♦
13.0k3625120
accept rate: 41%

edited 08 Sep '10, 13:23

I found some skipped saved searches using your search, but not for the day in question. I verified that the scehduled search events's scehduled_time field was correct (ie. 5 minute intervals). Will need to dig deeper to find out why our summary index is missing events.

(08 Sep '10, 11:58) stephanbuys

realtime_schedule is set to 0 for the saved searches in question.

(08 Sep '10, 11:59) stephanbuys

Is it possible that no events occurred with a 5 minute window? I've added a search above to check for that.

(08 Sep '10, 13:25) Lowell ♦

We think we found our issue, some of the events get logged a lot later, but has a timestamp that sometimes falls in a Summary Indexing window that has already passed. At least we can confirm that Summary Indexing seems to work reliably. Will raise a new question for this backfill challenge. Thanks!

(09 Sep '10, 13:53) stephanbuys

Yeah, that can be tricky to spot. I assume you know about the _indextime field (add in 4.0), which can be quite helpful in tracking down this kind of issue. I think the general rule of thumb is to simply delay your summary indexing searches to the point at which you are certain all your events are loaded, but that may not be an option for you. (The file polling / indexing performance of 4.1 is much better than earlier versions, so if your running an older version and your mostly looking at events coming from log files, then upgrading may help here.) Best of luck!

(10 Sep '10, 14:22) Lowell ♦
Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×394
×317
×30

Asked: 07 Sep '10, 08:43

Seen: 1,081 times

Last updated: 08 Sep '10, 13:23

Copyright © 2005-2014 Splunk Inc. All rights reserved.