Hi,
We're debugging an issue where disk latency shoots up at a specific time. I would like to create a search which shows the host with the highest latency at any specific minute.
So the base search is:
index=os sourcetype=iostat | multikv fields avgWaitMillis
...but then I'm not sure how to continue... I would like to find every host where avgWaitMillis is the highest for every minute.
I think you may want to pipe to the timechart
command, which will allow you gain stats over time. You may be able to do something like:
..| timechart span=1m max(avgWaitMillis) as maxWait
I haven't used a split-by cause (don't think you'll need one), but if you need one, just add something like, "by someField
" (where someField is a unique split-by-cause you have).
Please see documentation:
http://docs.splunk.com/Documentation/Splunk/4.3.4/SearchReference/Timechart
http://docs.splunk.com/Documentation/Splunk/4.3.4/SearchReference/CommonStatsFunctions
To elaborate a bit on that sample table:
At time n, the avgWaitMillis
of host001 equals max(avgWaitMillis)
of all hosts (at that time).
Likewise, at time l, the avgWaitMillis
of host219 == max(avgWaitMillis)
of all hosts at that time.
Thanks, but that is still not what I'm after. useother
only affects the grouping of the hosts in the chart.
timechart
is really not the answer here, since I'm not concerned about the values themselves, but which hosts had the max value at a particular time.
Since I'm primarily interested in the hostnames, a chart is probably not the best visualization, but rather a table, with values about like this:
time , host_with_highest_latency
time n, host001.domain.com
time m, host321.domain.com
time l, host219.domain.com
Have you tried adding useother=f
(mentioned in the docs), like so:
..| timechart span=1m max(avgWaitMillis) as maxWait by host useother=f
I can't remember how specific the useother boolean needs to be, but you can also try useother=false
, or the binary equivalent (e.g. "1" OR "0").
Thank you for your effort to help, never the less!
I'm afraid this doesn't do what I want at all.
That will just show the max values of avgWaitMillis, without even mentioning the host.
I want to know which host had the highest latency, not what the highest latency was.
Doing the same by host
doesn't help me either, for out of the hundred or so hosts, the majority will be lumped into OTHER
. So knowing that one host of 90 in OTHER
had the highest latency at 21:15 and 23:30 reveals nothing.