Deployment Architecture

Why are all scheduled jobs being run on one search head in our Splunk 6.3 search head cluster, causing some jobs to be skipped?

ejharts2015
Communicator

We recently upgraded to a 3 node search head cluster of 8 core boxes. Our limits.conf across the cluster is:

max_searches_perc = 50
base_max_searches = 10
max_searches_per_cpu = 10

So according to some Splunk math:
max_searches_per_cpu × number of CPUs + base_max_searches = Total number of searches
10 X 8 + 10 = 90 (x 3 SHs) = 270 concurrent searches

I've recently noticed we started to have some of our scheduled jobs skipped for some unknown reason. So I started some digging and discovered via this search:

index=_internal source=*metrics.log group=searchscheduler | timechart partial=false span=1m sum(dispatched) AS Started, sum(skipped) AS Skipped by splunk_server | table _time Started*

That ALL our scheduled jobs were running on ONE search head. I assumed that with SH Clustering, these scheduled searches would be divided up across the cluster based on the load of each search head. As this is NOT the case, how can I reasonably expand the cluster to allow for our increased number of scheduled reports?

alt text

1 Solution

behlkush
Path Finder
index=_internal source=*metrics.log group=searchscheduler | timechart partial=false span=1m sum(dispatched) AS Started, sum(skipped) AS Skipped by splunk_server | table _time Started*

DISPATCHED --> in my opinion dispatched are always from CAPTAIN. You will have a better idea if you do this:

index=_internal sourcetype=splunkd component=Metrics group=searchscheduler host=splunksearchhead* | timechart span=1h sum(completed), sum(skipped) by host

and then see if the searches are getting distributed properly across search heads.

View solution in original post

behlkush
Path Finder
index=_internal source=*metrics.log group=searchscheduler | timechart partial=false span=1m sum(dispatched) AS Started, sum(skipped) AS Skipped by splunk_server | table _time Started*

DISPATCHED --> in my opinion dispatched are always from CAPTAIN. You will have a better idea if you do this:

index=_internal sourcetype=splunkd component=Metrics group=searchscheduler host=splunksearchhead* | timechart span=1h sum(completed), sum(skipped) by host

and then see if the searches are getting distributed properly across search heads.

ejharts2015
Communicator

This is a way better search. Thanks!

0 Karma

jplumsdaine22
Influencer
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...