Getting Data In

Best way to index a file for once-off analysis

RVDowning
Contributor

What is the best way to index a file (user application file) or two for a one time analysis? Should I create a new index, use a new source type, copy the file(s) into a directory on the server and then use the Web based data upload?

Then just delete everything afterwards?

Tags (3)
0 Karma

kristian_kolb
Ultra Champion

Yes, in general a good idea. No need to invent sourcetypes if those that exist are good/relevant. Remember that the retention time of an index relates to the event timestamps, not when the logs were ingested. Keeping a too short retention time could cause them to be deleted before you have time to make the analysis.

grijhwani
Motivator

1) If you have full access to the deployment, then by far the simplest way to do this is create yourself an index for the purpose, give yourself query rights to same, and delete the entire index once you are sure you are finished with the data. It keeps it neatly encapsulated, and makes the release of storage a doddle. If you go the alternative route of selectively deleting the event data using a delete directive, this is laborious, risks selecting/deleting unexpected data, will not fully release the storage used until the index content surrounding it is expired (because the data itself is not deleted from storage, per se, only de-indexed) and expunged by house-keeping, and has a high processing overhead by deleting each event index item in turn.

2) Bear in mind that whichever way you choose to go, deleting the data/index will not cancel the licence usage incurred. If the data is large in comparison with your licence cap you may cause yourself problems if this is a repeated occurrence.

0 Karma

RVDowning
Contributor

Most helpful, thanks much.

0 Karma

strive
Influencer

I asked for frequency, so that you can create a separate index for this and set data retention as 1 or 2 days. That way you need not delete or clean index every time you finish your analysis.

We followed same approach while trying proof of concepts. We created a new index for this, called it as temp_analysis_idx. Created a new sourcetype as temp_analysis_srctype. Rest everything as you have mentioned in your question. This helped us not to mess up with other indexers. Since data retention was set as 2 days, automatically data used to get deleted and we need not execute commands every time.

RVDowning
Contributor

Probably not often, but why would the frequency really matter?

0 Karma

strive
Influencer

The answer depends on how frequently you do this kind of one time analysis?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...