Getting Data In

Best way to index a file for once-off analysis

RVDowning
Contributor

What is the best way to index a file (user application file) or two for a one time analysis? Should I create a new index, use a new source type, copy the file(s) into a directory on the server and then use the Web based data upload?

Then just delete everything afterwards?

Tags (3)
0 Karma

kristian_kolb
Ultra Champion

Yes, in general a good idea. No need to invent sourcetypes if those that exist are good/relevant. Remember that the retention time of an index relates to the event timestamps, not when the logs were ingested. Keeping a too short retention time could cause them to be deleted before you have time to make the analysis.

grijhwani
Motivator

1) If you have full access to the deployment, then by far the simplest way to do this is create yourself an index for the purpose, give yourself query rights to same, and delete the entire index once you are sure you are finished with the data. It keeps it neatly encapsulated, and makes the release of storage a doddle. If you go the alternative route of selectively deleting the event data using a delete directive, this is laborious, risks selecting/deleting unexpected data, will not fully release the storage used until the index content surrounding it is expired (because the data itself is not deleted from storage, per se, only de-indexed) and expunged by house-keeping, and has a high processing overhead by deleting each event index item in turn.

2) Bear in mind that whichever way you choose to go, deleting the data/index will not cancel the licence usage incurred. If the data is large in comparison with your licence cap you may cause yourself problems if this is a repeated occurrence.

0 Karma

RVDowning
Contributor

Most helpful, thanks much.

0 Karma

strive
Influencer

I asked for frequency, so that you can create a separate index for this and set data retention as 1 or 2 days. That way you need not delete or clean index every time you finish your analysis.

We followed same approach while trying proof of concepts. We created a new index for this, called it as temp_analysis_idx. Created a new sourcetype as temp_analysis_srctype. Rest everything as you have mentioned in your question. This helped us not to mess up with other indexers. Since data retention was set as 2 days, automatically data used to get deleted and we need not execute commands every time.

RVDowning
Contributor

Probably not often, but why would the frequency really matter?

0 Karma

strive
Influencer

The answer depends on how frequently you do this kind of one time analysis?

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...