Getting Data In

Huge disk space discrepancy on indexers in cluster

jonym4
Explorer

Some background:

So we are having some problems in our environment, we have a cluster of indexers and some of the servers are close to getting full while others are at 50-60% free disk space (keep in mind they have the same amount if inital disk space).

Hot/warm is kept locally (and that is where the disk space problem resides) and cold is moved over to SAN.

I understand that there could be an issue with replication and that a couple of servers could have more disk space occupied,

Most of the indexes have the "homePath.maxDataSizeMB" set but some just are just rolled after a certain amount of time, as I wasn't the one setting it up and usually don't manage it so I don't know the reasons behind how and why everything was configured as it is.

My questions is:

Is there any way of balancing the indexes between indexers beside setting strict values or measuring indexed data per day vs. hot/warm disk space available?

The problem with this is that we have a lot of indexes (40+) and I don't really have the knowledge of the environment and time to make a judgement on how to tune it.

0 Karma
1 Solution

jonym4
Explorer

Sorry for the late reply here, but we solved this as while ago. There were lots of data being stored after backup script during a hardware migration that resided in the same path as all the hot/warm buckets in Splunk that took up ~30% of the local disk space.

We deleted it and everything is up and running fine now.

View solution in original post

0 Karma

jonym4
Explorer

Sorry for the late reply here, but we solved this as while ago. There were lots of data being stored after backup script during a hardware migration that resided in the same path as all the hot/warm buckets in Splunk that took up ~30% of the local disk space.

We deleted it and everything is up and running fine now.

0 Karma

jplumsdaine22
Influencer

🙂 Would you mind accepting your answer? There should be a button that says Accept Answer below my comment somewhere

0 Karma

muebel
SplunkTrust
SplunkTrust

As alacercogitatus mentioned, you'll want to look at how the heavy-forwarders are connecting to the indexers (every forwarder has every indexer in outputs.conf, there isn't network connectivity issues, etc), but one other thing worth mentioning is that you might have a large amount of excess buckets on the indexers near capacity. These accumulate over time, especially if you are performing maintenance on the indexers, and depending on your rep/search factors. They can be removed periodically to free up space.

See : http://docs.splunk.com/Documentation/Splunk/6.2.3/Indexer/Removeextrabucketcopies

0 Karma

jonym4
Explorer

Already tried removing excess buckets a few times a week but no major change in free disk space, I'm contemplating if there is data on these machines that aren't recognized by Splunk anymore, is there any way of finding this out?

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

A quick item to check is your forwarders. Make sure that the forwarders are load balancing correctly, and have the entire set of indexers configured as outputs.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

How many Heavy Forwarders are balancing their data onto how many Indexers? Too few HFs can cause "rolling denial of service" attacks against your own Indexers...

@alacer's search might be quicker like this:

| tstats count where index=* by splunk_server index sourcetype
0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Thanks @martin_mueller. I keep forgetting about tstats.

0 Karma

jonym4
Explorer

Did the search you mentioned but couldn't find the server with disk problem, did another search on that server only, like:

|tstats count where index=* splunk_server="$host*" by index sourcetype

When doing this I only got a small number of results compared to the other servers. ~25 000 compared to 147 million on one of the servers that has no disk issue problems.

There could be something missing, as I'm not really aware of all the intricacies of Splunk.

0 Karma

jonym4
Explorer

We are sending all data through Heavy Forwarders and they are then loadblanced to the indexcluster.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Interesting. Why the Heavy Forwarders? Can you verify that the events are being spread correctly? You might be able to tell if there is a wayward input somewhere.

|metasearch index=* | stats count by splunk_server sourcetype

This will give you a better picture of what Indexers are receiving which sourcetype, and if they aren't even, you could probably find the wayward input.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...