We have around 80 saved searches that run per minute on our search head. Each night we see the search dispatch times slide from running ontime to being dispatched 2 hours late by the end of the night before we have to restart the search head. We have one search head and two indexers. Our search head has 32 cores. We have the max_searches_per_cpu value set to the default of 4. Now we think we should be able to run 132 saved searches concurrently = 4+ 32 * 4 = 132. Is this math correct? How many search heads should be used if we are trying to run 80 saved searches per minute? Does anyone run a similar number of searches? How many search heads gives you adequate performance?
I'll comment that "Why so many searches every minute?" Do the people who are consuming these alerts actually act upon them within a minute? It's like when people understand the price tag associated with the fabled "five 9's (99.999%) uptime".
It might make more sense to run a handful at a time, on a five-minute schedule, with some at :01, some at :02, some at :03, repeating around the clock.
Do all 80 saved searches finish within 1 minute? If not, then you will start stacking searches onto the queue until it's soo bogged down you have to restart. That searchhead sounds beefy enough, how much RAM do you have on it? Why do you require 80 saved searches every minute? It feels as if some optimization could be done.
After running the search, you'll want to hit the little icon that looks like a stair-stepped column chart, to the left of the "Export" link in the results area. It should produce a column chart showing literally the "tall poles" for the long-running search. If you want to see it in tabular form, run:
index=_internal source=*scheduler.log earliest=@h | stats count, sum(run_time) AS runtime by savedsearch_name | sort - runtime
Change the "earliest" parameter as desired. The example shown is "back to the top of the hour." Count is number of runs, runtime is in seconds.
Based on your specs provided for searchheads and amount data processed, you're good in terms of hardware. I'd start by doing sowings recommendation, and back it off to 5 minutes minimum. 5 minutes notification is normally more than enough time and doesn't cause huge issues. most users don't even notice it's down that fast.
Concerning optimization strategies, first you need to identify the longest-running searches (using that scheduler.log query) and then I'd go for reducing the number of events scanned and looking for redundancies as a first step, there are many more.
Detailed optimization depends on each individual query, for that I'd redirect you to our sales department 🙂
index=_internal source=*scheduler.log | timechart sum(run_time) AS runtime by savedsearch_name - this search returned gibberish.
We have spread out our saved searches and the result is we are running 80 per minute. Lets move away from how many searches per minute and look more at how can we tune our search head to better utilize the resources available. we index about 30 GB of data per day give or take. When you say optimizing your saved searches can anyone point me to some documentation that describes this?
Consider evaluating the "tall pole" with a search like the one below, and optimize some of the searches themselves to improve the overall runtime.
index=_internal source=*scheduler.log | timechart sum(run_time) AS runtime by savedsearch_name
Very true. mookie: how much are you indexing per day?
Concerning the "how many search heads" part of your question - it depends on the searches whether your performance is bottlenecked by the search head or the indexers. If they're data-heavy you might be maxing out your indexers providing data rather than the search head, in that case you may either see gains by optimizing your searches to require fewer events scanned or by adding more indexers to serve up the data faster.
My first task would be to see if you can combine any of the searches into a smaller number of searches, and then increase the time between runs to be more than it takes the time of execution, if possible. So if you have 5 service owners, you may be able to create a single search for all of their services, and send 1 email or alert with all the relevant information. Also take a look at optimizing the searches, this is very important as you start to scale up in size.
We have 80 saved searches that run per minute because that is what has been requested by service owners to validate the enviornment. The searches are not very complex. Usually they finish very fast, but not all 80 are able to complete in the span of the minute. As you said this backs up the search heah and the searches are executed later and later than they were scheduled for. The search head has 32 GB of RAM.