Getting Data In

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

pavankemisetti
New Member

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

Tags (1)
0 Karma

pavankemisetti
New Member

Thank you Rich

0 Karma

Richfez
SplunkTrust
SplunkTrust

So, the most accurate and probably "best" answer here is to call your Splunk rep. If you contact them, they have teams of people they can rely on for pre-sales work like identifying approximate data volumes and stuff. I say this is probably best because it's a "full featured" solution, in that they'll help you in so many other ways too.

But, for your specifics - there's a couple of ways.

First, count what you have. Like, literally. Let's say you have log files you are rolling once per day (so one file is one day) and each file averages about 100 MB, with a maximum file size in the past week of 150 MB. Your first guess might be 150MB * 1.25, because though Splunk compresses data it also has overhead. That's 187 MB/day. I'd always always have some room for growth and slack, so that's at least 200 MB.

Your next bet is to just stand up a single box Splunk install - a throwaway VM with reasonable specs (2+ cores, 4+ GB of RAM and say 30 or 40 GB fo disk space) can be used as a non-production box running Splunk Free to just ingest a small portion of those files. Then take a look at your index size. So, for instance, on that particular set of data, you may find compression is great so it's only 150 MB after all is said and done. Maybe it has lots of indexed fields and it's 300 MB/day. Either way, this way is fairly accurate.

Repeat on the different kinds of data you have. For instance, all IIS or Apache logs will be pretty similar to one another, so the same calculations work. Windows Event logs will be different though, indeed for those you will pretty much have to pull in a handful of servers to see how much data is really generated.

Either way, take those numbers and multiply it out by how many of those files or servers you have. Repeat until you've accounted for everything you want to pull in.

Or again, call your rep. 🙂

Happy Splunking!
-Rich

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...