How to get Capacity planning and Availability repo...

ansif · ‎02-28-2019

How to get Capacity planning and Availability report in Splunk for servers?

I wish to get reports on Servers Availability and Capacity planning. Is there any readymade search available for those reports?

lakshman239 · ‎02-28-2019

Capacity planning is very broad. From nix and windows TA, you could monitor and trend the CPU, memory, disk utilisation a period of time [ say daily, monthly, yearly]. You could then, based on your goals, decide to procure additional hardware or disk when constantly you are reaching your threshold [ e.g. disk usage is more than 75%].

Also, you can group the performance by applications, eg. web servers usage, database usage and decide to procure servers only for them and increase your scale/availability.

ansif · ‎02-28-2019

Can I have a sample search to achieve this using avg values of cpu,memory and disk?

skoelpin · ‎02-28-2019

I actually just built a Capacity Planning solution for my organization. It's using machine learning to forecast when a server or cluster will run out of disk and doubles up as a "what if machine" so the user can go through scenarios to see what happens if they remove 10TB from this cluster, when will it run out of disk. You can also enter any future date and it will give you disk usage at that date.

You should first define what your future state will be and what you want to accomplish.

gowtham495 · ‎02-28-2019

I'm working on a similar problem @skoelpin . Could you please elaborate your approach of solving this.
Problem i'm facing is like, single host has multiple mounts(C:, D:, etc). my approach is good when servers list is less but when # of servers increase, it's difficult.
Thanks is advance.

skoelpin · ‎03-01-2019

Sure, I had the same problem. We had to figure out a clever way to scale this. We achieved this through a few methods.. We first started with a single drive and 5 clusters with a total of 15 servers. I created 2 total lookup files, the first one with host values to drive the first dropdown, when the user selects the app, it dynamically populates the second drop down so the user can select a single host or an aggregate of the cluster. The second lookup table holds a row for the host, slope, y intercept, and drive letter. Anytime disk is purged or added, the y intercept value will change but the slope will remain constant.

When we started to scale, we had to reduce our dependency on the lookups because it was getting difficult to maintain these values across hundreds of servers. We found a way to dynamically populate the slope value and created an additional dropdown for drives so we could do multiple drives per host.

Another approach we took to match the model name to the host value selected was to use a good naming convention for the model names. So if the user were to select a hostname in the dropdown, that hostname will be passed to the model name and will look like this | apply Forecasting_$HOST$.

One last word of advice.. Create short feedback loops to judge accuracy. You gotta be confident in the results your getting from the forecast so creating a few panels dedicated to accuracy is important

ansif · ‎02-28-2019

Can I have a sample search to achieve this using avg values of cpu,memory and disk?

skoelpin · ‎03-01-2019

Sure. The SPL below just does disk, but you can easily add cpu and mem with additional counters.

index=xxx host=xxx   sourcetype="Perfmon:FreeDiskSpace" counter="% Free Space" OR counter="Free Megabytes" instance=G:
| eval FreeGB=FreeMBytes/1024
| eval Free_percent=100-storage_used_percent
| timechart span=1d min(FreeGB) AS FreeGB min(Free_percent) AS Free_percent
| eval Used_percent=100-Free_percent
| eval Total_Cap=100*(FreeGB/Free_percent)

Next I created a timeshift so I could create empty buckets for future values then fed it into the MLTK to fill the empty buckets with the (slope + the previous value) to get future forecasted values. The "what if" part comes from adjusting the y intercept value.

| makeresults count=100000 
| streamstats count as count 
| eval earliest_time=now() 
| eval time=case(count=100000,relative_time(earliest_time,"+100000d"),count=1,earliest_time) 
| makecontinuous time span=1d 
| eval timeAsANumber=time 
| eval _time=time 
| eval time_human=strftime(time, "%Y-%m-%d %H:%M:%S") 
| fields + time 
| append 
    [| search
index=xxx host=xxx   sourcetype="Perfmon:FreeDiskSpace" counter="% Free Space" OR counter="Free Megabytes" instance=G:
    | eval FreeGB=FreeMBytes/1024
    | eval Free_percent=100-storage_used_percent
    | timechart span=1d min(FreeGB) AS FreeGB min(Free_percent) AS Free_percent
    | eval Used_percent=100-Free_percent
    | eval Total_Cap=100*(FreeGB/Free_percent)]

lakshman239 · ‎02-28-2019

Nope. You would need to build one based on your needs.

ansif · ‎02-28-2019

Do you have a suggestion over Capacity planning? I am using both unix and windows addon to get memory,CPU and disk utilization.

How to get Capacity planning and Availability report in Splunk for servers?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life