Can someone help me with a query relating to avera...

numetheus · ‎08-29-2012

I was wondering if someone can help me with something I am trying to do. I have two extract fields called metricvalue and metrichost. Metricvalue contains latency values, and metrichost contains hostname of the node writing the latency.

Currently, I have the following query:
sourcetype="metrics" source="/gmc-logs/prod/metrics.log" serve.pixel.request.get.lat.avg metrichost=pixeltmp* | timechart avg(metricvalue)

This gives me a time chart of latency values so that I can track latency across all nodes over time. If I add "by metrichost" at the end, I get a FEW of the hosts with average latency values, but it seems to sort by hostname, not higher latency nodes at the top.

So my questions are ...

1) How do I make it so that higher value average nodes are first on the list, rather than by metricnode?

2) I need to alert if the average latency for the hour is a certain value or higher. I can't seem to get this to work.

sideview · ‎08-29-2012

Well, the default behavior of the timechart command is to show only 10 values of the split by field, and it will pick what it determines to be the top 10 values of the split-by field. By default it then will pick the ten values that have the largest total area under the curve. So at least in theory, if you're graphing latency it should be doing what you want already.

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Timechart

You can play around with it more yourself by using the explicit syntax. Here's the full explicit version, although this is what it does by default; it's just the explicit version:

| timechart avg(metricvalue) by metrichost where sum in top10

again since that's the explicit version of the default behavior, it should be exactly the same as

| timechart avg(metricvalue) by metrichost

You can also check out the reverse -- that is the ten values that have the lowest total overall latency, with the following:

| timechart avg(metricvalue) by metrichost where sum in bottom10

And possibly someone else on the site will be able to add some ideas about why what you're actually seeing is not the ten metrichost values with the highest latency.

As to your second question, that one's a little easier.

If the search itself is only ever searching an hours worth of data, and your threshold was say "17", you could simply save this as a new alert.

<searchterms> | sourcetype="metrics" source="/gmc-logs/prod/metrics.log" serve.pixel.request.get.lat.avg metrichost=pixeltmp* | stats avg(metricvalue) as latency | where latency>17

It would alert you when the search had 1 result, ie when the latency was greater than 17, and it would not alert you when the search had 0 results.

If on the other hand you want to run the search over say 24 hours, and alert you when any one of those hours hit the condition, then you might do this:

<searchterms> | sourcetype="metrics" source="/gmc-logs/prod/metrics.log" serve.pixel.request.get.lat.avg metrichost=pixeltmp* | bin _time span="1h" | stats avg(metricvalue) as latency by _time | where latency>17

Of course once it started emailing you, it would keep doing so for 24 hours.

Can someone help me with a query relating to averages?

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!