Solved: Stretch nonzero result above zero results by user

iKate · ‎01-14-2013

Hello, we've faced with a problem of results trunkating while using join command. In fact limitations of max results for join are extremely dissapointing, we stuk with them here and there and changing params in limits.conf doesn't help anymore. One of our searches stopped working correctly due to this limits. Now I'm trying to remake a problem search to exclude using of join.

The search might find users that made their first monetary purchase after making a promo purchase. Each user can have multiple promo and monetary purchases. Promo purchsases can be of several promo offers.

How it was done before:

source="my_source"
| where isnotnull(offer_uid)
| stats min(_time) as promo_purchase_time by user_uid offer_uid
| join [search source="my_source" tx_amount>0
| stats min(_time) as monetary_purchase_time by user_uid ] 
| where promo_purchase_time < monetary_purchase_time 
| stats count as Customers by offer_uid

What I'm trying to make:

source="my_source" 
| eval monetary_purchase_time=if(tx_amount>0, _time, a)
| eval offer_name=if(isnotnull(offer_uid), offer_uid, a)
| eval offer_purchase_time=if(isnotnull(offer_uid), _time, a)
| stats min(monetary_purchase_time) as min_monetary_purchase_time, values(offer_purchase_time) as offer_purchase_time by user_uid offer_name _time

I get this:

_time                      user_uid    min_monetary_purchase_time  offer_name   offer_purchase_time
12/10/12 9:06:33.017 PM    727889                               OFFER1      1355159193.017
12/10/12 9:11:08.225 PM    727889                               OFFER1      1355159468.225
12/10/12 9:22:36.926 PM    727889                               OFFER2      1355160156.926
12/14/12 9:42:17.085 PM    727889                               OFFER2      1355506937.085
12/15/12 8:58:13.862 PM    727889      1355590693.862

I want to get plain table for each user_uid and his promo offers where can be made a comparision of his first monetary and promo purchase times.

When I change last stats command to this one:

| stats min(monetary_purchase_time) as min_monetary_purchase_time, min(offer_purchase_time) as min_offer_purchase_time by user_uid offer_name

..I get this:

user_uid    min_monetary_purchase_time  offer_name     min_offer_purchase_time
727889                                  OFFER1          1355159193.017
727889                                  OFFER2          1355160156.926

But I want to get this:

user_uid    min_monetary_purchase_time  offer_name     min_offer_purchase_time
727889      1355590693.862               OFFER1          1355159193.017
727889      1355590693.862               OFFER2          1355160156.926

with which I can then use a desired command:

| stats count(eval(min_monetary_purchase_time > offer_purchase_time)) as Customers by user_uid offer_uid

How can I fill empty fields of parameter min_monetary_purchase_time for each user with the value of first monetary purchase time?

I've tried variants with chart, downfill, but got incorrect values.

jonuwz · ‎01-14-2013

It looks like you've got 2 different types of event message

1 contains the promo offers and times, and the other contains real purchases.

The only thing that joins all this data together is the user_uid, so thats the only thing you can use in a stats 'by' clause without segregating the data up.

The problem with that is that you also need to sort by offer name and time, so if you put all the promo stuff in 1 field :

| eval promo_stuff=offer_name.";".offer_purchase_time

Then do a stats to get the min_monetary_purchase_time, and a list of all the promo stuff 'by' user_id

| stats min(monetary_purchase_time) as min_monetary_purchase_time, values(promo_stuff) as promo_stuff by user_uid

You can then expand the promo_stuff so that the user_uid and min_monetary_purchase_time is set in each of the new events

| mvexpand promo_stuff
| rex field=promo_stuff (?<offer_name>[^;]+);(?<offer_purchase_time>.*)
| fields - promo_stuff

From here you can just do stats again to get the min offer time.

View solution in original post

jonuwz · ‎01-14-2013

It looks like you've got 2 different types of event message

1 contains the promo offers and times, and the other contains real purchases.

The only thing that joins all this data together is the user_uid, so thats the only thing you can use in a stats 'by' clause without segregating the data up.

The problem with that is that you also need to sort by offer name and time, so if you put all the promo stuff in 1 field :

| eval promo_stuff=offer_name.";".offer_purchase_time

Then do a stats to get the min_monetary_purchase_time, and a list of all the promo stuff 'by' user_id

| stats min(monetary_purchase_time) as min_monetary_purchase_time, values(promo_stuff) as promo_stuff by user_uid

You can then expand the promo_stuff so that the user_uid and min_monetary_purchase_time is set in each of the new events

| mvexpand promo_stuff
| rex field=promo_stuff (?<offer_name>[^;]+);(?<offer_purchase_time>.*)
| fields - promo_stuff

From here you can just do stats again to get the min offer time.

jonuwz · ‎01-15-2013

No problem. Looks like we have similar data and reports to create 😉

iKate · ‎01-15-2013

@jonuwz, you're genius! I realized the problems you've stated in the beginning of your answer but didn't think of such uniting of offer stuff. Thank you very much!

sideview · ‎01-14-2013

Here's my suggestion. I'm probably making some assumptions that are wrong, but even so hopefully you can get the idea.

source="my_source"
| where isnotnull(offer_uid)
| eventstats min(_time) as promo_purchase_time by user_uid offer_uid
| where tx_amount>0 OR isnotnull(offer_uid)
| eval purchase_time=if(tx_amount>0,_time,null())
| eventstats min(purchase_time) as monetary_purchase_time by user_uid
| where promo_purchase_time < monetary_purchase_time
| stats dc(user_uid) as Customers by offer_uid

The eventstats command is very useful because it can essentially do a whole pass over the entire dataset, calculate aggregate stats, optionally segmented with a "by user_uid offer_uid" as you see here, but then it paints these aggregate stats back onto the original events, which are otherwise unaltered. Using eventstats with a "by" clause is very powerful and a good tool to do advanced things like this the "Splunk" way without joins and appends.

Ultimately though you have to just build and constantly troubleshoot these searches pipe by pipe, testing as you go. Making sure that the rows of key value pairs being output by a given command are going to match the assumptions your next command is making about its incoming rows.

iKate · ‎01-15-2013

@sideview thanks for suggestion. It seems that the key to my problem was as @jonuwz described. I haven't used eventstats command yet, maybe because it isn't still fully clear when it's better to use eventstats instead of stats.

Stretch nonzero result above zero results by user

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk