Splunk Search

How much of the time is...

martindalum
Engager

I'm collecting lots of data about a large amount of machines with the linux and unix ta (but that's a bit irrelevant with regards to this question other than to give an example).

I would like splunk to answer questions like "How much of the time does.. match?" - ie. "How much of the time is cpu_load over 90 by host?"

I'm accomplishing something similar with this search (although this is event-correlated, not time-correlated):

sourcetype=cpu | multikv fields pctIdle | eval Percent_CPU_Load = 100 - pctIdle | stats count(eval(Percent_CPU_Load<90)) AS below, count(eval(Percent_CPU_Load>=90)) AS over by host | eval all=below+over | eval TimeOverloaded=tostring(round(over/all*100, 2))+"%" | table host, TimeOverloaded

This, however, seems like a very tedious way to get to this information. It feels like there should be a simple search command to answer these kind of questions like stat, chart etc., but I can't find it. All data in splunk is time correlated, so this should certainly be possible.

If a command like this already exist, I apologize. If not, I would like to request this feature - although I'm at a loss as to how this command should be named 🙂

A search command like this would be very useful when calculating eg. SLA fulfillment.

Tags (1)

martin_mueller
SplunkTrust
SplunkTrust

You can certainly slim down that query:

sourcetype=cpu | multikv fields pctIdle | stats count AS all, count(eval(pctIdle<=10)) AS over by host | eval TimeOverloaded=tostring(round(over/all*100, 2))+"%" | table host, TimeOverloaded

martin_mueller
SplunkTrust
SplunkTrust

You could use streamstats to add the next event's timestamp to each event, calculate the difference, use that as valid duration for the event and hence have an approximation of the time during which your CPU was greater than 90%.

Not having a dedicated command for this special case makes this special case a bit harder to build, but having powerful generic commands makes it possible in the first place. Imagine how many very very specific commands there would have to be to cover every possible eventuality.

martindalum
Engager

You are right.. My query is rather long 🙂

I still think Splunk could use a more dedicated command to accomplish this more generally.

There's a big difference in the fact that this search counts the percentage of the logged events where the host(s) are overloaded - not the percentage of the time where the host(s) are overloaded.

A dedicated command could eg. take into account how time slots with missing data should be handled. Using addinfo to get the search start- and endtime and then calculating how many data points I should have makes the search even more complicated 🙂

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...