Getting Data In

I'm a new customer — can you help me with an Index best practices question for Splunk Cloud?

bearlmax
New Member

We are a new customer to Splunk and are about to start ingesting data into our Splunk Cloud instance. I’m curious what the communities recommendations are around setting up Indexes.

We plan to ingest about 10 different data sources from our prod environment. Some of these data sources have 10s of millions of events per day. Some have just a few thousand events per day. Is it better to create one giant index called “prod” and have the different data sources split out by source type? Or it is better to give each data source its own index?

Our use case is that we mostly run searches within a given data source, but there will be occasions where we submit searches that join data from the different data sources. We will be generating lookup tables with these different data sources to help with this.

From my understanding, the general rule is that you don’t want to have everything in one index, but also too many indexes becomes difficult to manage. We are leaning towards separating out our data sources into their own indexes, which will mean that we'll end up with about 40 indexes.

Does that sound like a lot to manage? Do people spend a lot of time managing the indexes? My expectation is that we won’t be managing much of the indexes after they have been initially set up and configured.

0 Karma

woodcock
Esteemed Legend

There are 3 things to consider when laying out index values:
1: In/Visibility (RBA): who should/not see what data? Map out the roles that need to see the data and the roles that should not be able to see the data. Access to data is only reliably enforced at the index level.
2: Retention: how long do you need to keep each kind of data? Bucket rolling/deletion is implemented at the index level.
3: Value/Risk: how important is the retention goal for this data? If you have very important data and not-so-important data in the same index and you must keep it for a year and you have enough space to hold about 15 months, but somebody makes a mistake and turns on debug on the not-so-important data stream and it accidentally sends in 1 year's worth of data in a day, before it is discovered, you now have prematurely aged out most, if not all, of your important data, and it is GONE.

If you get these 3 right, everything else will be just fine.

0 Karma

dkeck
Influencer

HI,

I think once you set up an index, you dont spent much time managing it. You should also consider your role and user concept before setting up your indexes. You might want to restrict the access to certain indexes for users/ teams.

As well as your retention time, you might want to keep data longer or shorter time periodes.

Searhes over more than one index are totaly normal and this will not force you to store your data in one index,

So I would go the way to set up an index for each type of data. So if you have windows eventlogs set up and wineventlog index, or oracle data base data set up oracle index and so on. I think its also the best way to go, because most of the TA´s and Apps for splunk are working that way, they will all work for a certain type of data and set up an index for that.

Within a type of data you can than set up your sourcetypes and sources.

With this number of indexes you will hopefully think about an indexer cluster, what will make it easier to manage your indexes over the master.

This might is worth reading as well https://hub.packtpub.com/splunk-how-to-work-with-multiple-indexes-tutorial/

I hope the helps a little 🙂

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...