Knowledge Management

Can I use "sistats median(x)" to build a histogram of x?

Lowell
Super Champion

If I have a summary indexing search like this:

.... | sistats median(x)

I get a list of values and counts in a field called psrsvd_rd_x, that contains values like this:

0e+00:9;3e+00:1;4e+00:1;6e+00:9;...

Which seems to be a semicolon-separated list of values and counts (which are separated by a colon). So the value "0" occurs 9 times, "3" and "4" both occurred once, "6" occurred 9 times, ...

So I'm wondering if I can use this information to to build a histogram of the values of x? It seems like this should be possible since splunk seems to be storing counts of my distinct values anyway (which seems like the very definition of a histogram). So this should be possible, in theory anyways.

Has anyone been able to do this? I've tried a few searches but haven't had any success so far. Are there any gotchas with the way the sistats command summarizes this information that would cause trouble if I tried to graph this as a histogram? (In this particular case, the possible distinct values for x is fairly small; there are probably less than 50 distinct values for any given summarized period.)


Yes, I know I could be make a second summary index generating search that stores of counts by value; but I already have a summary index search that calculates median(x), so I was thinking I could leverage the events that were already in my summary index.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

sistats automatically stores the "minimum statistics" required to be able to create aggregates of the function you're specifying, e.g., you may store the median(x) hourly, but sistats will store enough to be able compute median(x) daily or weekly or whatever. In the case of median() and any percentile function, this is the same info (so you could in fact get perc95(), perc5(), etc., out of the data that was generated only using median(). This also happens to be almost the same information as distinct_count(x), with the difference that the percentile functions can assume numeric data, and can (and will) thus compress the representation and discard precision to save space, while dc won't.

However, I don't believe that the built-in stats or other functions will allow you to enumerate out the values and counts that are stored by sistats that way.

Get Updates on the Splunk Community!

Splunk APM: New Product Features + Community Office Hours Recap!

Howdy Splunk Community! Over the past few months, we’ve had a lot going on in the world of Splunk Application ...

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...