I know this has been asked before, but I'm hoping that I've misunderstood how deletion works.
The situation is that we have a single main index with 500,000,000 items in it, and 300,000,000 of those are the result of someone accidentally writing their windows security logs from their production machines into the index.
We're extremely low on disk space and in lieu of getting more provisioned, which is problematic I hoped I might be able to remove those entries out of the index somehow.
I know I can run a delete, but I understand this won't remove the data from the index. I also realise I can delete a whole index using the CLI, or delete data from an index based on an expiry strategy.
Can i remove data from an index that's mixed with other data from the same time period, or am I completely stuck? Perhaps I can move the data we want to keep to a new index and delete the erroneous data. Am I permanently stuck with those 300,000,000 junk rows?
Please help
David
The only standard way of removing data other than deleting an index is to cross age- or size-based thresholds per index (default 500GB and several years), and delete
indeed doesn't clear up disk space... but you knew that already 🙂
In theory you could manually delete single buckets, if and only if that bucket contains nothing but undesired events... however, that's likely a risky procedure and certainly needs working backups to be feasible.
Moving data to a new index selectively... I don't know of a way to do that. You could of course re-index from raw data.
In addition to separating indexes and introducing temporary indexes for testing purposes, I avoid using the default/main index entirely in production environments. That way any data added carelessly without specifying an index can safely be dropped by cleaning the index.
I thought that might be the case. We'll be more careful separating out indexes in future