Splunk Search

equivalence of sql join of two different group bys

eyaler
Explorer

i have data of the form:
day, hour, seller, buyer

i want to find all instances where a seller appears only on a single day and sells to more than 10 different buyers in the same hour.

in sql i would do:
select a.* from (select day, hour, seller, count(distinct buyer) as cnt_buy from sales group by day, hour, seller) a join (select seller, count(distinct day) as cnt_date from sales group by seller) b on a.seller=b.seller where cnt_buy>10 and cnt_date=1

clarification: I would like to combine the two following queries:

(1) find all sellers which appear only on single day:

| stats dc(day) as cnt_date by seller| where cnt_date=1

(2) find all instances where one seller sold to more than 10 buyers in
one hour:

| stats dc(buyer) as cnt_buy by day, hour, seller | where cnt_buy>10

now i want to join the above two so that only the sellers in (1) are used in (2). what would be an efficient
splunk way to do so?

Tags (4)
1 Solution

somesoni2
Revered Legend

Try this

base search | stats dc(buyer) as buyers by seller, day, hour | eventstats dc(day) as days by seller | where buyers>10 AND days=1

View solution in original post

somesoni2
Revered Legend

Try this

base search | stats dc(buyer) as buyers by seller, day, hour | eventstats dc(day) as days by seller | where buyers>10 AND days=1

eyaler
Explorer

looks good. it would however seem more efficient to first filter sellers by the second condition (days=1) and only then do the hourly counts on those sellers. could you provide a way to do it?

0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

@somesoni2: I was actually thinking sub search given the clarification... but I'm stuck on the time factor at the moment...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

rsennett_splunk
Splunk Employee
Splunk Employee

This may seem oversimplified... but if your data in Splunk holds day,hour,seller,buyer fields for each event, all you would have to do is group the fields, so that you were counting the "buy" and sorting by the other stuff so it narrowed down by seller:

|stats count by day hour seller|where count>=10

In this case, the "buyer" is present on all of them... and how you order the fields after the 'by' will determine what exactly you are counting. In other words, the last field in the list:seller is counted per hour, per day

the transition from thinking in terms of rows and columns to thinking in terms of events and how to manipulate them, does take a bit of acclimation...

If your data is really in two physical data sources, you need to show the data more clearly so we can give you the syntax but unless I'm mistaken, the join was recursive so you could get the count. (my sql is quite rusty at this point). In this case, you are counting the events so you don't have to create something to represent the "count". It's more of a by product of the calculation...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

eyaler
Explorer

not sure we understand each other. essentially i want to combine the following queries:
1. find all sellers which appear only on single day:
| stats dc(day) as cnt_date by seller| where cnt_date=1
2. find all instances where one seller sold to more than 10 buyers in one hour:
| stats dc(buyer) as cnt_buy by day, hour, seller | where cnt_buy>10
now i want to join the above two so that only the sellers in (1) are used in (2).
what would be an efficient splunk way to do so?

0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...