Deployment Architecture

Different storage space occupancy within index cluster

seilemor
Engager

Dear Splunk community,

I'm relative new within the splunk cluster topic. Since three years I'm using Splunk in a standalone environment and has now switched over to a cluster to get the benefit of HA.

My new cluster works as expected. Seachfactor 2 and replication factor 3. In my cluster I have one searchhead, one master and three peernodes.

Now I've noticed that the disk storage occupucity is different between the peernodes. In my mind the clustering with replication factor = 3 means that each peernode will hold the same data and based on this the disk usage should be similar. Cn someone explain me this behaviour or did I have maybe an issue in my cluster configuration!? Attached you can find three screenshots regarding the disk usage of one of my indices.

alt text
alt text

Best regards
seilemor

0 Karma
1 Solution

tiagofbmm
Influencer

If one of the indexers with a searcheable copy fails, then Splunk starts a procedure to regain the search factor it lost: Splunk will copy the tsidx files from the remaining peer that has them to the one that has only the raw data.

More, if both Splunk peers with searchable copies die, you still can live with that because your remaining index has the raw data in it, and Splunk can remake the tsidx files from that raw data and so everything in Splunk will be searchable again.
Note that this last process of rebuilding tsidx files is time and resource intensive. But you'd still have everything after it finishes

View solution in original post

0 Karma

seilemor
Engager

Thanks to everyone. Question has been answered. I've stopped Splunk on one of my both servers which are hold the IDX files. Now I can see all the IDX files also on the third machine and the used disk storage is similar to each other.

0 Karma

tiagofbmm
Influencer

If one of the indexers with a searcheable copy fails, then Splunk starts a procedure to regain the search factor it lost: Splunk will copy the tsidx files from the remaining peer that has them to the one that has only the raw data.

More, if both Splunk peers with searchable copies die, you still can live with that because your remaining index has the raw data in it, and Splunk can remake the tsidx files from that raw data and so everything in Splunk will be searchable again.
Note that this last process of rebuilding tsidx files is time and resource intensive. But you'd still have everything after it finishes

0 Karma

seilemor
Engager

Hi,

thank your for your response. What will happen now if one of the two systems which are holding the IDX files is going down!? Will the third machine which only hold the _raw data generate the IDX files too!?

I've checked the size on some of the buckets through the cluster in relation to the IDX and raw files and get exactly what you are telling me. Without IDX files one example bucket has 99MB, with IDX the bucket has 480MB.

Regards
seilemor

0 Karma

tiagofbmm
Influencer

Hi,

You are correct by saying each of the Peers should have one copy of each bucket (just the _raw data I mean) as Splunk tries to spread that.

What you are not considering is that the major part of space occupied by a bucket is not the _raw data itself. It is the "indexing" process that creates the structure to make data searchable. And in this case, you have only search_factor of 2, which might mean 2 of your peers are getting the two copies of the "searchable part" as they should. The remaining one will not receive any copy because it does not need to. The Search Factor is already respected with two copies!

Check your monitoring console to make sure it is working properly!

valiquet
Contributor

I agree with tiagofbmm, IDX files can take around 50% of your stored data. Since you set a search factor of 2, only 2 copies our of 3 has the tsidx

0 Karma

p_gurav
Champion

Hi,

Can you try adding forceTimebasedAutoLB = true parameter to your outputs.conf.

Also, is all indexers have same storage capacity?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...