Hello, we've faced with a problem of results trunkating while using join command. In fact limitations of max results for join are extremely dissapointing, we stuk with them here and there and changing params in limits.conf doesn't help anymore. One of our searches stopped working correctly due to this limits. Now I'm trying to remake a problem search to exclude using of join.
The search might find users that made their first monetary purchase after making a promo purchase. Each user can have multiple promo and monetary purchases. Promo purchsases can be of several promo offers.
How it was done before:
source="my_source"
| where isnotnull(offer_uid)
| stats min(_time) as promo_purchase_time by user_uid offer_uid
| join [search source="my_source" tx_amount>0
| stats min(_time) as monetary_purchase_time by user_uid ]
| where promo_purchase_time < monetary_purchase_time
| stats count as Customers by offer_uid
What I'm trying to make:
source="my_source"
| eval monetary_purchase_time=if(tx_amount>0, _time, a)
| eval offer_name=if(isnotnull(offer_uid), offer_uid, a)
| eval offer_purchase_time=if(isnotnull(offer_uid), _time, a)
| stats min(monetary_purchase_time) as min_monetary_purchase_time, values(offer_purchase_time) as offer_purchase_time by user_uid offer_name _time
I get this:
_time user_uid min_monetary_purchase_time offer_name offer_purchase_time
12/10/12 9:06:33.017 PM 727889 OFFER1 1355159193.017
12/10/12 9:11:08.225 PM 727889 OFFER1 1355159468.225
12/10/12 9:22:36.926 PM 727889 OFFER2 1355160156.926
12/14/12 9:42:17.085 PM 727889 OFFER2 1355506937.085
12/15/12 8:58:13.862 PM 727889 1355590693.862
I want to get plain table for each user_uid and his promo offers where can be made a comparision of his first monetary and promo purchase times.
When I change last stats command to this one:
| stats min(monetary_purchase_time) as min_monetary_purchase_time, min(offer_purchase_time) as min_offer_purchase_time by user_uid offer_name
..I get this:
user_uid min_monetary_purchase_time offer_name min_offer_purchase_time
727889 OFFER1 1355159193.017
727889 OFFER2 1355160156.926
But I want to get this:
user_uid min_monetary_purchase_time offer_name min_offer_purchase_time
727889 1355590693.862 OFFER1 1355159193.017
727889 1355590693.862 OFFER2 1355160156.926
with which I can then use a desired command:
| stats count(eval(min_monetary_purchase_time > offer_purchase_time)) as Customers by user_uid offer_uid
How can I fill empty fields of parameter min_monetary_purchase_time
for each user with the value of first monetary purchase time?
I've tried variants with chart, downfill, but got incorrect values.
It looks like you've got 2 different types of event message
1 contains the promo offers and times, and the other contains real purchases.
The only thing that joins all this data together is the user_uid, so thats the only thing you can use in a stats 'by' clause without segregating the data up.
The problem with that is that you also need to sort by offer name and time, so if you put all the promo stuff in 1 field :
| eval promo_stuff=offer_name.";".offer_purchase_time
Then do a stats to get the min_monetary_purchase_time, and a list of all the promo stuff 'by' user_id
| stats min(monetary_purchase_time) as min_monetary_purchase_time, values(promo_stuff) as promo_stuff by user_uid
You can then expand the promo_stuff so that the user_uid and min_monetary_purchase_time is set in each of the new events
| mvexpand promo_stuff
| rex field=promo_stuff (?<offer_name>[^;]+);(?<offer_purchase_time>.*)
| fields - promo_stuff
From here you can just do stats again to get the min offer time.
It looks like you've got 2 different types of event message
1 contains the promo offers and times, and the other contains real purchases.
The only thing that joins all this data together is the user_uid, so thats the only thing you can use in a stats 'by' clause without segregating the data up.
The problem with that is that you also need to sort by offer name and time, so if you put all the promo stuff in 1 field :
| eval promo_stuff=offer_name.";".offer_purchase_time
Then do a stats to get the min_monetary_purchase_time, and a list of all the promo stuff 'by' user_id
| stats min(monetary_purchase_time) as min_monetary_purchase_time, values(promo_stuff) as promo_stuff by user_uid
You can then expand the promo_stuff so that the user_uid and min_monetary_purchase_time is set in each of the new events
| mvexpand promo_stuff
| rex field=promo_stuff (?<offer_name>[^;]+);(?<offer_purchase_time>.*)
| fields - promo_stuff
From here you can just do stats again to get the min offer time.
No problem. Looks like we have similar data and reports to create 😉
@jonuwz, you're genius! I realized the problems you've stated in the beginning of your answer but didn't think of such uniting of offer stuff. Thank you very much!
Here's my suggestion. I'm probably making some assumptions that are wrong, but even so hopefully you can get the idea.
source="my_source"
| where isnotnull(offer_uid)
| eventstats min(_time) as promo_purchase_time by user_uid offer_uid
| where tx_amount>0 OR isnotnull(offer_uid)
| eval purchase_time=if(tx_amount>0,_time,null())
| eventstats min(purchase_time) as monetary_purchase_time by user_uid
| where promo_purchase_time < monetary_purchase_time
| stats dc(user_uid) as Customers by offer_uid
The eventstats command is very useful because it can essentially do a whole pass over the entire dataset, calculate aggregate stats, optionally segmented with a "by user_uid offer_uid" as you see here, but then it paints these aggregate stats back onto the original events, which are otherwise unaltered. Using eventstats with a "by" clause is very powerful and a good tool to do advanced things like this the "Splunk" way without joins and appends.
Ultimately though you have to just build and constantly troubleshoot these searches pipe by pipe, testing as you go. Making sure that the rows of key value pairs being output by a given command are going to match the assumptions your next command is making about its incoming rows.
@sideview thanks for suggestion. It seems that the key to my problem was as @jonuwz described. I haven't used eventstats command yet, maybe because it isn't still fully clear when it's better to use eventstats instead of stats.