Hello
I have a scheduled search which runs every 5 minutes and if a failure occures Splunk should send an email. Sometimes it happens that errors are written in the logs and index near realtime but the scheduled search doesn't send an email.
I found these messages in the splunk logs
09-05-2013 09:30:11.732 +0200 INFO SavedSplunker - savedsearch_id="admin;tivoli;ITM - TEPS Memory pthread Error", user="admin", app="tivoli", savedsearch_name="ITM - TEPS Memory pthread Error", status=skipped, scheduled_time=1378366200
WARN SavedSplunker - Max alive instance_count=1 reached for savedsearch_id="admin;tivoli;ITM - TEPS Memory pthread Error"
So it seems that splunk skipped the SavedSearch and I didn't get an Email if failures occure.
Is it possible to queue SavedSearches, so that they can run within the time the next schedule interval for the same SavedSearch is reached?
Alternative I have to change the limits.conf but I'm not sure if this is useful for my 2 core machine
[search]
# the maximum number of concurrent searches per CPU
max_searches_per_cpu = 2
# the base number of concurrent searches
base_max_searches = 10
[scheduler]
# the maximum number of searches the scheduler can run, as a percentage
# of the maximum number of concurrent searches
max_searches_perc = 50
Thanks for all your tips and hints
Robert
Did you say you are running on a 2-core machine? Indexer+SH? If so, I am not at all surprised that you see skipped searches. Indexing on a single indexer can consume up 5-6 cores alone, which leaves no headroom for searches to execute.
The fact that your search gets skipped means that the previous run has not completed when the next one is scheduled. In other words: Your search scheduled to run every five minutes sometimes takes more than five minutes to run, very likely because your system is completely starved for (compute) resources.
You are running on a completely underpowered system, even without doing any searches. If you have any option of increasing your core count, that is your best bet for sure, no matter how little data you may be indexing. Your problem may be compounded by having suboptimal disk storage underneath, but without more details about your environment, it's hard to provide guidance.
The bottom line is that any Splunk indexer running with less than 8 cores is likely to expose these symptoms with any amount of search workload; depending on data ingest volume sooner or later. You need more horsepower. 🙂
Why can't we adjust Max alive instance_count=1 for the scheduler?