Knowledge Management

HOW TO: Purge (CLEAN) Selective Data from an Index

salighie
New Member

According to the documentation i'm reading, permanently purging selective data (matching search filter/s) doesn't appear to be possible.

I'm wondering if there isn't a work-around / procedure of sorts.

For example, is it possible to swap the data, much like swapping partitions in a relational db:
1. Extract / move the data i want to keep from "PROD" Index to a "TEMP" index (stash)
ex: index=PROD source=something other_filters | collect index=TEMP
2. Clean the "PROD" index
ex: splunk clean eventdata -index index_name
3. re-inject the data from "TEMP" index back into "PROD"
ex: index=TEMP | collect index=PROD

Thoughts/ guidance would be greatly appreciated.

regards
Seb

0 Karma
1 Solution

adonio
Ultra Champion

imho your approach will work,
however, consider couple of things:
1. if youll use other sourcetype then stash you will use data against license.
2. some search extractions might not work as you have a new sourectypes
3. if you have anything that calls your previous index, youll have to modify and modify again.
4. will be very tough (to impossible) to use if you have an indexer cluster architecture

hope it helps

View solution in original post

0 Karma

adonio
Ultra Champion

imho your approach will work,
however, consider couple of things:
1. if youll use other sourcetype then stash you will use data against license.
2. some search extractions might not work as you have a new sourectypes
3. if you have anything that calls your previous index, youll have to modify and modify again.
4. will be very tough (to impossible) to use if you have an indexer cluster architecture

hope it helps

0 Karma

salighie
New Member

I'll give it a shot and see what happens.

Regarding your considerations:
1. I was under the impression that "collect" automatically sets the sourcetype to 'stash' - is that not true?
2. Can you give an example of which search extractions might not work; and what are you referring to exactly when you say "as you have a new sourcetypes?"
3. I assume that i would need to put the index in 'single-user' mode so as to prevent reads and then pause the data connectors to stop any writes.
4. I don't have a clustered architecture; i have a small, dev implementation.

Thanks for the reply.
regards

0 Karma

adonio
Ultra Champion

indeed collect sets sourcetype to stash
most of the time, users apply search time extractions, as well as search filters to sourcetype. meaning that if you have a search index=a sourcetype=b youll have to change it to index=c sourcetype=stash
also if you have field extractions based on sourcetype b you will have to modify them
not sure what single-user mode means as describe, however, defiantly stop any incoming data as the collect command will execute only on the data fetched at the search / execution time
good luck with your purge!

if it answers your questions, kindly accept the answers for others to know it worked for you

0 Karma

salighie
New Member

single-user mode as in prevent other accounts from searching the index while i'm running thru the procedure.

Thanks, I'll run through the procedure and update on what i find.

you have been very helpful and insightful. i wanted to award you some points but the system wont allow me - says i don't have enough Karma.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...