scheduled summary searches in a distributed setup ...

briang67 · ‎06-19-2012

Hello,

I have a search head that has the webintelligence app loaded. I've created the summary indexes on a pair of remote indexers. Populating the indexes is done via scheduled searches - my question is where should these searches actually run? I'm not sure what is the best setup for this:

If I enable forwarding on the search head it seems that the summary searches run on the search head and then forward the results to the remote indexes (I have forwarding enabled on the search head.)

The data needed to generate the summarization is also located on the same remote indexers as where the summary indexes are located. Does this mean that the search is distributed (from the search head), results are sent back to the search head and then forwarded back to the remote summary indexes?

The app is not actually installed on the remote indexers. If I install the app on both and enable just half of the scheduled searches on each (and disable all from the search head) - will this accomplish the same thing with the benefit that the searches would be running locally where the source data and the summary indexes are actually located? Is this any more efficient than the first method?

Thanks!

araitz · ‎06-22-2012

Great questions!

To keep things simple, most folks just run the summary indexing searches on the search head and have the summary index data stored on the search head as well.

You can certainly configure the search head to forward data the the remote indexers, which will result in the summary data being stored on the remote indexers. Your search head will still pick up the data, assuming that all the indexers are search peers. Yes, you have essentially captured the data flow in that case:

data from forwarders --> indexers 
search head runs saved search --> indexers return data --> search head reduces --> data is forwarded back to summary index on indexers
search head runs dashboard powering search --> indexers return data from their summary indexes

Splunk will take care of moving the parts of the app that the indexers need (such as eventtypes and field extractions) from the search head to the indexers as a built-in part of distributed search - we call this 'bundle replication'.

Most people don't run the saved searches on the indexers. This increases management complexity quite a bit, and moreover since all the data stays local (i.e. indexer01 only has summary index data from its own web log data) you don't really get any performance increase, whereas forwarding in auto-loadbalanced mode from your search head to your indexers spreads the data around with a bit less normality, which in general is a good thing.

araitz · ‎06-25-2012

Good point, I neglected to mention that configuration entirely. Some folks find that dedicating a search head to scheduled searches, either summary index populating or alerting, is the best approach for them.

briang67 · ‎06-25-2012

Thank you for the response. In my case the search head could not keep up with serving both the dashboard real time searches and the summary index searches - even though I was sending the data back to the remote indexes.

From testing it looks like the most efficient config for me was to have a server dedicated to the summary indexes - responsible for both running the summary searches and storing the data.

scheduled summary searches in a distributed setup - where do they run?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes