Getting Data In

What is the difference between a normal set of indexers versus an indexer cluster?

sarnagar
Contributor

Hi All,

Whats the difference between Normal set of Indexers and an Indexer Cluster? I read the documents about it and I have few doubts w.r.t it.

For ex:
1. I have a set of 5 Indexers and have configured my forward data to these 5 Indexers. If one of my Indexers fail, does that mean Indexing will be affected in the remaining 4 Indexers unlike a Cluster architecture? Will my other indexers continue to collect data?
2. Docs says that multiple copies of data will be stored in a cluster set. Even in normal Indexers, the data is forwarded to all indexers right? It still has the multiple copies .i.e all my 5 Indexers have the data right?

Please correct me If my understanding is incorrect.

0 Karma
1 Solution

lguinn2
Legend

Your understanding is incorrect.

Scenario 1: you have 5 indexers that are not clustered. You have forwarders set up to "autoloadbalance" between the indexers. Each indexer will have approximately 20% of the data. If one indexer goes down, the forwarders will continue to send data to the surviving 4 indexers. The surviving 4 indexers will continue to index the incoming data as normal. If the failed indexer cannot be recovered, approximately 20% of the indexed data will need to be recovered from a backup, etc. Also, searches will be incomplete until the indexer returns to service or the data is recovered.

Scenario 2: you have 5 indexers in a cluster. Forwarding still operates in the same way, where forwarders "autoloadbalance" their forwarding between the indexers. However, as an indexer parses the data and places it into the index buckets, it also replicates the data to other indexers. The number of replicants is configurable, along with other options. If an indexer is lost, search may continue uninterrupted using the replicated data and/or the index buckets will be rebuilt as needed to ensure that sufficient copies of the data exist.

While there are technically a few other ways that you could make "live" copies of your indexes, the best way is to use clustering. Most other techniques are more expensive, harder to implement/manage or don't allow seamless recovery from indexer loss.

View solution in original post

lguinn2
Legend

Your understanding is incorrect.

Scenario 1: you have 5 indexers that are not clustered. You have forwarders set up to "autoloadbalance" between the indexers. Each indexer will have approximately 20% of the data. If one indexer goes down, the forwarders will continue to send data to the surviving 4 indexers. The surviving 4 indexers will continue to index the incoming data as normal. If the failed indexer cannot be recovered, approximately 20% of the indexed data will need to be recovered from a backup, etc. Also, searches will be incomplete until the indexer returns to service or the data is recovered.

Scenario 2: you have 5 indexers in a cluster. Forwarding still operates in the same way, where forwarders "autoloadbalance" their forwarding between the indexers. However, as an indexer parses the data and places it into the index buckets, it also replicates the data to other indexers. The number of replicants is configurable, along with other options. If an indexer is lost, search may continue uninterrupted using the replicated data and/or the index buckets will be rebuilt as needed to ensure that sufficient copies of the data exist.

While there are technically a few other ways that you could make "live" copies of your indexes, the best way is to use clustering. Most other techniques are more expensive, harder to implement/manage or don't allow seamless recovery from indexer loss.

sarnagar
Contributor

HI lguinn,
Thanks for the response.Few clarifications please,
Scenario 1 : Since Each indexer will have approximately 20% of the data, does that mean each Indexer collects different data? Well Im not exactly sure how the indexers( NON - Cluster) are configured in our orgainsation. When I search for a host on any Indexer , I can see the data. In this case how can the host forward the Data to all Indexers ? It's like replicate copies right?
Also you mentioned "approximately 20% of the indexed data will need to be recovered from a backup, etc." When the indexer itself goes down and is it not able to collect the data then how can we have a backup?

0 Karma

lguinn2
Legend

No the host does not forward the data to all indexers, whether you are using clustering or not.

The "distributed search" feature of Splunk allows a Splunk instance to act as a search head. (A Splunk server can be both an indexer and a search head.) There are no copies; each indexer collects different data. This is what is running in your non-clustered Splunk environment.

So, you run a search on a particular Splunk instance, and distributed search runs the search on all the indexers and retrieves the search results, presenting them to you.

Hopefully, you take routine backups in your production environment - on all kinds of servers - to protect against data loss. You should have similar backups of your Splunk servers and data. Should a server be lost, you should be able to retrieve the data from the backup media (whatever it is) and use that to restore the server. This is not a Splunk thing, it is server management best practice.

In a clustered environment, search heads still run a distributed search, but it is configured a little differently. Still, a search will be executed in parallel across a set of indexers in both cases.

sarnagar
Contributor

Hi @lguinn ,

Sorry I'm back to a similar doubt.

In out organistaion we have 4 clustered SHs set up in Non-Prod but they have set one single common URL for this Non-prod enivironment.

So my doubt here is - Which SH will this URL point to when we run a query? Will the load balancer decide and execute the query on a particular SH?

Another doubt is if I need to check the logs for some troubleshooting purpose. In which Search head should I query for errors since I'm not aware which search head is executing the saved searches etc. ?

0 Karma

lguinn2
Legend

All search heads can search all the data, assuming that they have been properly configured. Therefore, you should see the same results for a search, no matter where it is run.

The load balancer must assign a user to a particular search head for the entire session. Once a user is assigned to a search head, all of that user's ad-hoc queries will be run on the search head where the user is logged in.

Scheduled searches are not managed by the load balancer; scheduled searches run in the background. If the search heads are clustered, then the search head captain determines which search head will run the scheduled search.

sarnagar
Contributor

Thanks a lot for the input..!! Understood now..

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...