Deployment Architecture

What is the max recommended of indexes on a single indexer in AWS?

kalibaba2021
Path Finder

Hello Community,

I am looking at deploying Splunk Enterprise on AWS on a HEFTY EC2  compute-optimized instance, and attached EBS. i'd like to maximize the # of indexes on this EC2 since search performance is of no concern .  

I see the default index size is 500 GB, but I also know I can configure indexes.conf to whatever I want.

For example, if I think I'll have ~ 97 TB of data I could say maxVolumeDataSizeMB = 102603162 on a single BIG indexer .  But off course just because I can doesn't mean I should.

I see no clear recommendation of how to design multiple indexes with relation to Indexers and Search Heads. Maybe because it always depends. 🤗

In my case, since I care none for performance, can I put everything on one BIG EC2 ? Split it in 2 BIG machines ? As in, install Indexer and SH on same instance.

Thanks in advance👍

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

you are right, it depends 😉 When you are using any EBS backend EC2 nodes wit gp3 type EBS disk you definitely get more than 1200 IOPS, which is enough for normal use. But how big the indexes can be on one node? This depends on e.g. how much new data per delay you are ingesting. If you are not concerned about searches, you could probably use estimate 150-200B/day, but if you are using ITSI/ES then it’s less than 100.

If you have indexer cluster then you also need to think how long you can wait for rolling restarts? Those will disturbed longer time if you have fewer nodes with more data as there are assigning primaries when rebooting has done.

Those are only some items which you should think before you will make decisions. Fortunately you could change your environment later if needed. To do that easier I propose to instal at least indexer cluster or even multi site version.

r. Ismo

kalibaba2021
Path Finder

isoutamo, thank you for the quick response.  we plan on restoring data from the archive S3 bucket into this splunk instance, only on a per-need basis. So, very large amount of data will be ingested in a relatively short time period, but only rarely. Exact volume of data is unknown, but 50 - 100 TB's is possible.

But, not on continuous basis, so "per-day" metric will only apply during this rare ingestion event.  Neither redundancy not fast searches are of major concern.

what I gather from your response is:  at least ONE Indexer Cluster with 2 large nodes to start with .  Is it needed to have a dedicated Search Head, or can I install SH on one of the nodes in the said Cluster ?

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

As this is just a restoring splunk instance for thawed data, you could start with one instance. If/when you are using several instances for storing data, then you must use own scripts to retrieve data from S3 and rebuild it into thawedb directories. Probably this will be the most time consuming part of your process? I propose you to developing and testing this environment and process to achieve needed time to search from request. With this amount of data this could be a days before you could start searching!

kalibaba2021
Path Finder

@isoutamo - thanks very much for the info!

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@kalibaba2021 

If you don't care about about Splunk performance:

  • Do whatever you want.

 

If you care about Splunk performance and want to design the proper system:

 

I hope this helps!!! Kindly upvote if this helps!!!

kalibaba2021
Path Finder

@VatsalJagani - thank you for the info!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...