Deployment Architecture

Multipule Large Data Sets

fk319
Builder

I have several sources of data that run into my Splunk server, some of the data sets exceeded 1G per day.

What is the best way to keep the data seperated so that searches are quicker?

I have defined apps and sourcetypes, but not knowing the internals, not sure if this direction to go in.

Tags (1)
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

In general, for data volumes up to tens of GB per day there's no real advantage in separating the data to make search faster. There are some cases, however, where it makes sense to separate data into multiple indexes to gain "coherency" in the layout of data on disk to speed up raw data retrieval.

Specifically, if you have a low volume data set that's intermingled with a high volume data set and you commonly report on the entirety of the low volume data set, Splunk will have to decompress (and throw away) much of the high volume data set to get at the low volume one. In this case, segregating the lower volume data set into its own index can increase reporting performance from on the order of thousands of events per second to many tens of thousands of events per second.

If your searches are always over a small, scattered fraction of the data, and you can isolate that set, putting it in a separate index will help. If your reports are over many difference small, scattered data sets, without overlap, it's simplest and best to just keep the data in a single index.

View solution in original post

Stephen_Sorkin
Splunk Employee
Splunk Employee

In general, for data volumes up to tens of GB per day there's no real advantage in separating the data to make search faster. There are some cases, however, where it makes sense to separate data into multiple indexes to gain "coherency" in the layout of data on disk to speed up raw data retrieval.

Specifically, if you have a low volume data set that's intermingled with a high volume data set and you commonly report on the entirety of the low volume data set, Splunk will have to decompress (and throw away) much of the high volume data set to get at the low volume one. In this case, segregating the lower volume data set into its own index can increase reporting performance from on the order of thousands of events per second to many tens of thousands of events per second.

If your searches are always over a small, scattered fraction of the data, and you can isolate that set, putting it in a separate index will help. If your reports are over many difference small, scattered data sets, without overlap, it's simplest and best to just keep the data in a single index.

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...