Getting Data In

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

pavankemisetti
New Member

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

Tags (1)
0 Karma

pavankemisetti
New Member

Thank you Rich

0 Karma

Richfez
SplunkTrust
SplunkTrust

So, the most accurate and probably "best" answer here is to call your Splunk rep. If you contact them, they have teams of people they can rely on for pre-sales work like identifying approximate data volumes and stuff. I say this is probably best because it's a "full featured" solution, in that they'll help you in so many other ways too.

But, for your specifics - there's a couple of ways.

First, count what you have. Like, literally. Let's say you have log files you are rolling once per day (so one file is one day) and each file averages about 100 MB, with a maximum file size in the past week of 150 MB. Your first guess might be 150MB * 1.25, because though Splunk compresses data it also has overhead. That's 187 MB/day. I'd always always have some room for growth and slack, so that's at least 200 MB.

Your next bet is to just stand up a single box Splunk install - a throwaway VM with reasonable specs (2+ cores, 4+ GB of RAM and say 30 or 40 GB fo disk space) can be used as a non-production box running Splunk Free to just ingest a small portion of those files. Then take a look at your index size. So, for instance, on that particular set of data, you may find compression is great so it's only 150 MB after all is said and done. Maybe it has lots of indexed fields and it's 300 MB/day. Either way, this way is fairly accurate.

Repeat on the different kinds of data you have. For instance, all IIS or Apache logs will be pretty similar to one another, so the same calculations work. Windows Event logs will be different though, indeed for those you will pretty much have to pull in a handful of servers to see how much data is really generated.

Either way, take those numbers and multiply it out by how many of those files or servers you have. Repeat until you've accounted for everything you want to pull in.

Or again, call your rep. 🙂

Happy Splunking!
-Rich

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...