Reporting

Scheduler, to clean event data -index

rupesh_kumar
Engager

Hello Splunk Team,

How can I write/schedule a program (java/python) to clean the eventdata?

My use case is:

  • I am generating metadata and some additional information from binary files, I am dealing with a big dataset, 100-200 TB.
  • Each day we are producing 1040-1100 records (metadata), some of records may be same as day before they generated.
  • I am using relational database to store these record and using Splunk dbx to index the data with Splunk.
  • I wants to index fresh copy of 1040-1100 record each day to avoid the duplication.

Please provide your input on same.

Thanks in advance.

0 Karma
1 Solution

strive
Influencer

You can simply do this using a shell script. Write Shell script with 3 commands
Command 1: splunk stop
Command 2: splunk clean eventdata -index
Command 3: splunk start

Schedule this script to run before you index new data.

Same logic can be used in python script also. As per my knowledge you have to schedule that python program to run using some shell script.

View solution in original post

0 Karma

somesoni2
Revered Legend

Since the no of records are less, and it need to be updated frequently (daily), why don't you use lookup table file to store this metadata instead of Splunk Index. You can use outputlookup after your dbx command to updated the lookup table file from search.

Something like this

your dbx command | outputlookup YourLookupName.csv append=false 

Append=false will ensure data is overwritten, so you'll always have the latest data.

rupesh_kumar
Engager

somesoni2 - Thank you for your response.

0 Karma

strive
Influencer

You can simply do this using a shell script. Write Shell script with 3 commands
Command 1: splunk stop
Command 2: splunk clean eventdata -index
Command 3: splunk start

Schedule this script to run before you index new data.

Same logic can be used in python script also. As per my knowledge you have to schedule that python program to run using some shell script.

0 Karma

rupesh_kumar
Engager

Strive- Yes, there is not other way to clean the index. I am using mentioned script/commands to clean the index...

0 Karma

strive
Influencer

As per splunk documentation, splunk recommends to clean the data by stopping splunk. But i have not tried cleaning event data without stopping splunk.. So i do not know the impact.

0 Karma

rupesh_kumar
Engager

Strive- I am looking other solution. I don't wanna stop/start the server since this is my production server (enterprise app).

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...