Reporting

Scheduler, to clean event data -index

rupesh_kumar
Engager

Hello Splunk Team,

How can I write/schedule a program (java/python) to clean the eventdata?

My use case is:

  • I am generating metadata and some additional information from binary files, I am dealing with a big dataset, 100-200 TB.
  • Each day we are producing 1040-1100 records (metadata), some of records may be same as day before they generated.
  • I am using relational database to store these record and using Splunk dbx to index the data with Splunk.
  • I wants to index fresh copy of 1040-1100 record each day to avoid the duplication.

Please provide your input on same.

Thanks in advance.

0 Karma
1 Solution

strive
Influencer

You can simply do this using a shell script. Write Shell script with 3 commands
Command 1: splunk stop
Command 2: splunk clean eventdata -index
Command 3: splunk start

Schedule this script to run before you index new data.

Same logic can be used in python script also. As per my knowledge you have to schedule that python program to run using some shell script.

View solution in original post

0 Karma

somesoni2
Revered Legend

Since the no of records are less, and it need to be updated frequently (daily), why don't you use lookup table file to store this metadata instead of Splunk Index. You can use outputlookup after your dbx command to updated the lookup table file from search.

Something like this

your dbx command | outputlookup YourLookupName.csv append=false 

Append=false will ensure data is overwritten, so you'll always have the latest data.

rupesh_kumar
Engager

somesoni2 - Thank you for your response.

0 Karma

strive
Influencer

You can simply do this using a shell script. Write Shell script with 3 commands
Command 1: splunk stop
Command 2: splunk clean eventdata -index
Command 3: splunk start

Schedule this script to run before you index new data.

Same logic can be used in python script also. As per my knowledge you have to schedule that python program to run using some shell script.

0 Karma

rupesh_kumar
Engager

Strive- Yes, there is not other way to clean the index. I am using mentioned script/commands to clean the index...

0 Karma

strive
Influencer

As per splunk documentation, splunk recommends to clean the data by stopping splunk. But i have not tried cleaning event data without stopping splunk.. So i do not know the impact.

0 Karma

rupesh_kumar
Engager

Strive- I am looking other solution. I don't wanna stop/start the server since this is my production server (enterprise app).

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...