Getting Data In

How do you handle Summary Indexing in a distributed environment?

twinspop
Influencer

I'm very curious to hear how other admins are handling summary indexing with multiple indexers and search heads.

  • Schedule them on 1 Search Head?
  • Schedule on multiple SHs with Pooling enabled?
  • Schedule them on each Indexer?
  • Schedule them on 1 Indexer with distributed search to the others?
  • Have a dedicated "Collection" Search Head?
  • Send the results to the Indexers?
  • Set up local Summary Indexes on the SH(s)?
  • Dedicated Summary Indexers?

It seems like every option above is imperfect, making for many compromises. Please share your SI architecture and why you chose it.

Thanks,

jon

EDIT - I found this previous answer. It still leaves some questions though. If I want to search from the SH and collect into an index on the indexer, do I need to create a "dummy" index on the search head? Without the custom index on the SH, it won't let me schedule it. Seems a little hacky.

0 Karma

khourihan_splun
Splunk Employee
Splunk Employee

Forwarding the events and summaries to your indexers and turning off indexing on the Search Head is a best practice for several reasons:

  1. It makes sure the data is replicated and backed up via the index cluster. (resilient)
  2. If you add another search head the users on that search head will see the same data. (consistent)
  3. It distributes the search load among several indexers reducing the time for a large search to complete. (performant)

Hope this helps,
Kyle

khourihan_splun
Splunk Employee
Splunk Employee

If you don't create the "dummy" index on the search head you will get this error:

Encountered the following error while trying to update: In handler 'savedsearch': Index name=your_index_here does not exist. The summary index must exist in order for a scheduled search to populate it.

The search head uses indexes.conf to build a list of indexes it can operate on. So without it listed on the search head, you'll get this error.

Putting it on the SH also fixes autocomplete so when you type index= in the search bar that index shows up.

fk319
Builder

I have the same issue, and I was looking for you to solve my problem for me. I tried to set up a search and store the sumary index on one of the search heads. I set up an index on the one SH and I use a pool for my SHs. The problem is the other SH wants to run it and seems to be doing so, it is just not saving the data.

0 Karma

Brian_Osburn
Builder

I have 1 search head and 2 indexers (all are individual physical machines). I don't have any real indexes on my search head - everything gets forwarded to the 2 indexers. This includes Summary Indexes. So I create a summary index on my search head and both indexers, just to ensure everything works okay.

It does seem a little hacky, but it's probably the best way to handle it.

Brian

Brian_Osburn
Builder

You set up the search head with search peers - Splunk handles the rest in the background.

Take a look at: http://docs.splunk.com/Documentation/Splunk/4.3/Deploy/Configuredistributedsearch

landen99
Motivator

Distributed searching is completely different from distributed indexing. fk319 asked about the latter and Brian Osburn replied about the former. Distributed indexing is about multiple indexers simultaneously indexing information. Distributed searching is about searching multiple indexer nodes (any spunk instance with indexed data) simultaneously pulling indexed information back and merging the results. Search peers are indexer nodes specified for searching.

Setting up search peers merely enables you to search indexer nodes. All indexed data stored by "collect" is stored locally.

The option of using search pooling to share KO bundles can really kill performance because it copies all the KO, including the summary indexing for local copies on each search head.

0 Karma

fk319
Builder

Brian, how do you tell your search where your search index actually resides? (ie how do you forward your search results back to the indexers?)

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...