Splunk Search

Efficiency of comparing against summary statistics

tlagatta_splunk
Splunk Employee
Splunk Employee

I have a table with attributes ProductName and TotalSales, and I would like to extract the rows which are in the top 50% of total sales. Naively, I would pipe this into search TotalSales>=median(TotalSales). However, since search doesn't support the median function, Splunk returns no events.

I can make this work via the following hack:

| eventstats median(TotalSales) as MTS | where TotalSales>=MTS | fields - MTS

I'm worried about the efficiency of this hack. If I omit the fields - MTS command, then the output is a table with attribute MTS, with the median value replicated across all rows. If I have only 20 products, then this isn't that big of a deal, but if I have 500,000 products, then this is an enormous amount of redundancy in memory.

My question: what is Splunk doing under the hood? That is,

  • is Splunk literally replicating the median value in memory dc(ProductName) of times, then deleting it once I remove the MTS field?
  • Or, is Splunk smart enough to use the median value just once, and only replicate it only if I insist on viewing the entire table with the MTS field?
0 Karma

lguinn2
Legend

That is not a hack; it is a valid approach. Here is an alternate approach, which I believe will be slower, not faster:

yoursearchhere
| eval MTS = [ yoursearchhere-again | stats median(TotalSales) as query ]
| where TotalSales >= MTS

But you can test and see...

lguinn2
Legend

That information is not published, and I don't think it is the sort of thing that Splunk will reveal. However, it does seem silly to replicate the number in memory; I would guess that it is optimized.

0 Karma

tlagatta_splunk
Splunk Employee
Splunk Employee

Thanks, Lisa! That's an interesting workaround. However, it doesn't quite answer my "under the hood" question.

Suppose that we have 500,000 products. I'd like to know whether Splunk is literally storing the median value 500,000 times during the piping (replicated across each product name), or whether the replication happens purely at the end of the pipeline (when the visual table is being generated).

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...