Getting Data In

Managing disk space when moving data from old index to new index

bharathkumarnec
Contributor

Hi All,

We wanted to move data from one index to another index, below is our scenario:

1) Create a new index ABC and move specific source data to this index from old index DEF.
2) Delete the moved sourced data from old index DEF using | delete command.

As we know that using delete option data will not be deleted but be made non-searchable, so in our case disk usage will be both from ABC source index and DEF source index??

So for example source content size is 10GB we need 20GB on the disk to hold both ABC and DEF sources??

Is there any other option to get rid of using extra data usage on the disk??

If we use |collect command will it share the data equally to all the indexers that are available in our environment?

Help in this regard is highly helpful.

Thanks in advance!

Regards,
BK

0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

If you need to recover disk space, just about the only way (see below for my suggestion with collect) is to delete all the buckets with all the data and then re-index the data, putting the data into the indexes that you want them put into. I know this is not the answer you are looking for, but the delete command doesn't free disk space, and you can't simply move pieces of data from one index to another.

If you don't delete the buckets from DEF, but just delete (make the data invisible in searches) the ABC data, then you will have to use more disk space. So you may not need 2x the disk space, unless ABC contains almost all the data from DEF. But you will need 1x+%(DEF data copied into ABC). If 50% of the data from DEF is copied to ABC, then you will need 1.5x disk storage.

If you need to recover the disk space, you can't without removing all the data buckets containing the ABC data from the DEF buckets, and then you will loose all the DEF data from those deleted buckets. That would then require re-indexing the deleted data into DEF (which will go against your licensing).

If you use the collect command (whose original purpose was to create summary data), you can copy data to another index, but it will use your license if you use a sourcetype. You could create a new index (DEF2 for example) with the data you wish to keep from DEF, and the ABC index with the data you want from DEF, both using the collect command. Then you could delete DEF completely, and then rename your index DEF2 to DEF. This operation is not for the faint of heart. If you can perform this in one or two days, your licensing should not be an issue, and even if it is, Splunk Support is very helpful at sending a reset key if you need to do that. Collect should distribute the data across your indexers.

You are looking at a very difficult scenario. I would create a test environment and see if it works for you.

View solution in original post

cpetterborg
SplunkTrust
SplunkTrust

If you need to recover disk space, just about the only way (see below for my suggestion with collect) is to delete all the buckets with all the data and then re-index the data, putting the data into the indexes that you want them put into. I know this is not the answer you are looking for, but the delete command doesn't free disk space, and you can't simply move pieces of data from one index to another.

If you don't delete the buckets from DEF, but just delete (make the data invisible in searches) the ABC data, then you will have to use more disk space. So you may not need 2x the disk space, unless ABC contains almost all the data from DEF. But you will need 1x+%(DEF data copied into ABC). If 50% of the data from DEF is copied to ABC, then you will need 1.5x disk storage.

If you need to recover the disk space, you can't without removing all the data buckets containing the ABC data from the DEF buckets, and then you will loose all the DEF data from those deleted buckets. That would then require re-indexing the deleted data into DEF (which will go against your licensing).

If you use the collect command (whose original purpose was to create summary data), you can copy data to another index, but it will use your license if you use a sourcetype. You could create a new index (DEF2 for example) with the data you wish to keep from DEF, and the ABC index with the data you want from DEF, both using the collect command. Then you could delete DEF completely, and then rename your index DEF2 to DEF. This operation is not for the faint of heart. If you can perform this in one or two days, your licensing should not be an issue, and even if it is, Splunk Support is very helpful at sending a reset key if you need to do that. Collect should distribute the data across your indexers.

You are looking at a very difficult scenario. I would create a test environment and see if it works for you.

DalJeanis
SplunkTrust
SplunkTrust

Just a thought, but is there some trick that could be done using the retention period? Seems like the data won't really roll off into cold or frozen until the last relevant data in the bucket ages out, so I don't see a scenario that this would gain anything.

Sounds like a "compress" option for buckets that have experienced major deletions would be on everybody's wish list.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Retention on the data buckets won't work, but you can set the age of the tsidx data to be shorter, which will reduce the disk space, but increase the search times on the data that is aged out of the tsidx file.

0 Karma

bharathkumarnec
Contributor

Thanks for your detailed inputs, if we manage the retention it will not only apply to one single source, it will apply to the whole index that contains multiple sources.

So as we cannot apply retention on other logs this will not be a solution in this aspect as well??

0 Karma

pradeepkumarg
Influencer

If you want to get rid of the data from an entire index, you can use clean command
splunk clean eventdata -index

http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/RemovedatafromSplunk

0 Karma

bharathkumarnec
Contributor

The requirement is to delete certain source data not complete index data as there will be some other sources also in that index.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...