Reporting

Scheduler, to clean event data -index

rupesh_kumar
Engager

Hello Splunk Team,

How can I write/schedule a program (java/python) to clean the eventdata?

My use case is:

  • I am generating metadata and some additional information from binary files, I am dealing with a big dataset, 100-200 TB.
  • Each day we are producing 1040-1100 records (metadata), some of records may be same as day before they generated.
  • I am using relational database to store these record and using Splunk dbx to index the data with Splunk.
  • I wants to index fresh copy of 1040-1100 record each day to avoid the duplication.

Please provide your input on same.

Thanks in advance.

0 Karma
1 Solution

strive
Influencer

You can simply do this using a shell script. Write Shell script with 3 commands
Command 1: splunk stop
Command 2: splunk clean eventdata -index
Command 3: splunk start

Schedule this script to run before you index new data.

Same logic can be used in python script also. As per my knowledge you have to schedule that python program to run using some shell script.

View solution in original post

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Since the no of records are less, and it need to be updated frequently (daily), why don't you use lookup table file to store this metadata instead of Splunk Index. You can use outputlookup after your dbx command to updated the lookup table file from search.

Something like this

your dbx command | outputlookup YourLookupName.csv append=false 

Append=false will ensure data is overwritten, so you'll always have the latest data.

rupesh_kumar
Engager

somesoni2 - Thank you for your response.

0 Karma

strive
Influencer

You can simply do this using a shell script. Write Shell script with 3 commands
Command 1: splunk stop
Command 2: splunk clean eventdata -index
Command 3: splunk start

Schedule this script to run before you index new data.

Same logic can be used in python script also. As per my knowledge you have to schedule that python program to run using some shell script.

0 Karma

rupesh_kumar
Engager

Strive- Yes, there is not other way to clean the index. I am using mentioned script/commands to clean the index...

0 Karma

strive
Influencer

As per splunk documentation, splunk recommends to clean the data by stopping splunk. But i have not tried cleaning event data without stopping splunk.. So i do not know the impact.

0 Karma

rupesh_kumar
Engager

Strive- I am looking other solution. I don't wanna stop/start the server since this is my production server (enterprise app).

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...