Knowledge Management

Getting Proper Averages from Summary Index

deastman
Path Finder

First, as an example, I wanted to share that I thought the Question, and responses in this SA post was excellent and I stole the formatting Idea from the OP, and hope it will help: https://answers.splunk.com/answers/48641/summary-index-noob-question.html

first, the summary search:
- search name = "Summary CPU Usage".
- search = "sourcetype="Perfmon:CPU" counter="% Processor Time" instance="_Total" | sitimechart span=5m limit=0 avg(Value) by host".
- start time = "-20m@m" finish time = "-5m@m".
- scheduled to run every 5 minutes.
- alert condition = always.
- alert mode = once per search.
- summary indexing = enabled.
- summary index = "Performance_Summary".
- added fields: "report" = "cpu_usage".

-Report Search: index=Performance_Summary report="cpu_usage" | timechart span=15m count by host"

But this returns so many statitstics that it makes the graph unusable. And also, in doing by host as noted above it just pulls back the name of my search head not each individual node. I understand that this would need to be changed to orig_host, but why is that, and is there a way to change that, as users may not know when they need to do that to Summary Data.

Thanks!
Dustin

0 Karma

DalJeanis
Legend

Let's start with the host question.

your underlying query is this

sourcetype="Perfmon:CPU" counter="% Processor Time" instance="_Total"
| sitimechart span=5m limit=0 avg(Value) by host".

The values for host that will be set in the summary index will be the host field that was in the Perfmon:CPU records.

If that data only tracks your search heads, then that is the only thing in your summary index at the moment. To me, that seems unlikely, unless your search heads are set up for performance monitoring and the rest of your hosts are not.

More likely, your search heads may just be the most busy, so their records are the ones that get prioritized by the timechart command.

To validate this, pick a couple of non-search head hosts and do this...

 index=Performance_Summary report="cpu_usage" 
    host="myfirsthost" OR host="mysecondhost"
 | timechart span=15m count by host 

Assuming that shows good data, then we can ignore your orighost question, and move on to the big question. If not, then we need to backtrack and figure out what is going on with your system monitoring data.


The big question

What are your users doing with the data?

If they are trying to find busy servers, then maybe you need to segment the data a little more.

To make the best data visualization, you always have to assume the role of the person who you are making it for.

If I'm trying to find out which servers are being pounded, then maybe I want to see only servers that have more than 75% CPU.

If I'm trying to find out how my overall processes are running, maybe I want to see a summary of how many servers are running at each 10% increment (therefore ten lines). Or maybe I want <25% blue, 25-50% green, 50-75% yellow, 75-90% orange, 90%+ red.

The key is to always ask why anyone needs to look at the graph in the first place, what's the most important thing they need to know, and what's the next thing they are going to want to do with what they learn.

Once you identify that, then you can work out the data viz that allows them to do their job most easily.

0 Karma

deastman
Path Finder

Per Feedback from my End User in this case: I would be interested in having an average of CPU and memory in use every five minutes and every hour. I asked for further clarification and they users wants simply an average of CPU utilization over a 5 minute window, or over a 1 hour window/host.

I hope this helps clarify the use case.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...