Getting Data In

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

pavankemisetti
New Member

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

Tags (1)
0 Karma

pavankemisetti
New Member

Thank you Rich

0 Karma

Richfez
SplunkTrust
SplunkTrust

So, the most accurate and probably "best" answer here is to call your Splunk rep. If you contact them, they have teams of people they can rely on for pre-sales work like identifying approximate data volumes and stuff. I say this is probably best because it's a "full featured" solution, in that they'll help you in so many other ways too.

But, for your specifics - there's a couple of ways.

First, count what you have. Like, literally. Let's say you have log files you are rolling once per day (so one file is one day) and each file averages about 100 MB, with a maximum file size in the past week of 150 MB. Your first guess might be 150MB * 1.25, because though Splunk compresses data it also has overhead. That's 187 MB/day. I'd always always have some room for growth and slack, so that's at least 200 MB.

Your next bet is to just stand up a single box Splunk install - a throwaway VM with reasonable specs (2+ cores, 4+ GB of RAM and say 30 or 40 GB fo disk space) can be used as a non-production box running Splunk Free to just ingest a small portion of those files. Then take a look at your index size. So, for instance, on that particular set of data, you may find compression is great so it's only 150 MB after all is said and done. Maybe it has lots of indexed fields and it's 300 MB/day. Either way, this way is fairly accurate.

Repeat on the different kinds of data you have. For instance, all IIS or Apache logs will be pretty similar to one another, so the same calculations work. Windows Event logs will be different though, indeed for those you will pretty much have to pull in a handful of servers to see how much data is really generated.

Either way, take those numbers and multiply it out by how many of those files or servers you have. Repeat until you've accounted for everything you want to pull in.

Or again, call your rep. 🙂

Happy Splunking!
-Rich

0 Karma
Get Updates on the Splunk Community!

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...