Getting Data In

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

pavankemisetti
New Member

how to calculate approximate data that needs to be indexed in order to procure licensing as there would be multiple sources

Tags (1)
0 Karma

pavankemisetti
New Member

Thank you Rich

0 Karma

Richfez
SplunkTrust
SplunkTrust

So, the most accurate and probably "best" answer here is to call your Splunk rep. If you contact them, they have teams of people they can rely on for pre-sales work like identifying approximate data volumes and stuff. I say this is probably best because it's a "full featured" solution, in that they'll help you in so many other ways too.

But, for your specifics - there's a couple of ways.

First, count what you have. Like, literally. Let's say you have log files you are rolling once per day (so one file is one day) and each file averages about 100 MB, with a maximum file size in the past week of 150 MB. Your first guess might be 150MB * 1.25, because though Splunk compresses data it also has overhead. That's 187 MB/day. I'd always always have some room for growth and slack, so that's at least 200 MB.

Your next bet is to just stand up a single box Splunk install - a throwaway VM with reasonable specs (2+ cores, 4+ GB of RAM and say 30 or 40 GB fo disk space) can be used as a non-production box running Splunk Free to just ingest a small portion of those files. Then take a look at your index size. So, for instance, on that particular set of data, you may find compression is great so it's only 150 MB after all is said and done. Maybe it has lots of indexed fields and it's 300 MB/day. Either way, this way is fairly accurate.

Repeat on the different kinds of data you have. For instance, all IIS or Apache logs will be pretty similar to one another, so the same calculations work. Windows Event logs will be different though, indeed for those you will pretty much have to pull in a handful of servers to see how much data is really generated.

Either way, take those numbers and multiply it out by how many of those files or servers you have. Repeat until you've accounted for everything you want to pull in.

Or again, call your rep. 🙂

Happy Splunking!
-Rich

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...