Splunk Search

Ignore pairs of "matching" (not identical) events

woodcock
Esteemed Legend

I have a stream of events where a user has an activity={purchase, return, subscribe, unsubscribe} and product={prodA, prodB}. I have 2 systems (sourcetype={sourceA, sourceB}) that should agree 100% on these events but they do not. I used the following crude search to look for mismatches:

| stats count by user, product, activity | eval odd = count%2 | where odd==1 | stats list(activity) AS activities BY user,product| eval numActivities=mvcount(activities) | mvcombine activities | stats count by activities

This works pretty well but this search has shown me that one of the data sources has phantom events that cancel eachother out and which I need to ignore. For example, in this data set:


sourceA:
user1 purchase prodA
user1 return prodA
user1 purchase prodA
sourceB:
user1 purchase prodA

My search above would find a false positive as:
purchase,return

Keep in mind that the following data set results in a similar "activities" stat but, unlike previous (which is actually not a problem) indicates a problem:


sourceA:
user1 purchase prodA
user1 return prodA
user1 purchase prodB
sourceB:
user1 purchase prodA

I tried to use sets like the search below but it always times out:


set diff [search sourcetype="sourceA" | fields user,product,activity | fields - _*] [search sourcetype="sourceB" | fields user,product,activity | fields - _*]

The bottom line is that I need to pull out the differences between 2 sets of data but AFTER having filtered out some pairs of events from one of the sets (for example any user that has a cancelling-out pair of events like "purchase prodA + return prodA"). I know Splunk can do this but I am struggling...

Tags (1)
0 Karma
1 Solution

woodcock
Esteemed Legend

OK, I figured this out!

First, subsearches limit the output to only 100 values/events so this is out for a case like mine.

Second, the "set" functions (union/intersect) won't work for me because we have some special cases that aren't apple-to-apple; we need to say an apple and orange in one data set matches a kumquat in the other set.

So here is how I solved my problem:

First, after any other normalization, we create a new combination field and break this out by user:


|eval st_product_activity=sourcetype . "_" . product . "_" . activity
|stats list(st_product_activity) AS st_product_activities count by user

Next, we count the number of each type of event; because "mvcount" returns NULL instead of zero, we have to do some accommodating to ensure our "num_*" variables always contain an integer:


|eval num_sourceA_prodA_subscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodA_subscribe"))
|eval num_sourceA_prodA_subscribe = if(isnull(num_sourceA_prodA_subscribe), 0, num_sourceA_prodA_subscribe)
|eval num_sourceA_prodA_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodA_unsubscribe"))
|eval num_sourceA_prodA_unsubscribe = if(isnull(num_sourceA_prodA_unsubscribe), 0, num_sourceA_prodA_unsubscribe)
|eval num_sourceA_prodA_purchase = mvcount(mvfilter(st_product_activities = "sourceA_prodA_purchase"))
|eval num_sourceA_prodA_purchase = if(isnull(num_sourceA_prodA_purchase), 0, num_sourceA_prodA_purchase)
|eval num_sourceA_prodA_return = mvcount(mvfilter(st_product_activities = "sourceA_prodA_return"))
|eval num_sourceA_prodA_return = if(isnull(num_sourceA_prodA_return), 0, num_sourceA_prodA_return)

|eval num_sourceA_prodB_subscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodB_subscribe"))
|eval num_sourceA_prodB_subscribe = if(isnull(num_sourceA_prodB_subscribe), 0, num_sourceA_prodB_subscribe)
|eval num_sourceA_prodB_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodB_unsubscribe"))
|eval num_sourceA_prodB_unsubscribe = if(isnull(num_sourceA_prodB_unsubscribe), 0, num_sourceA_prodB_unsubscribe)
|eval num_sourceA_prodB_purchase = mvcount(mvfilter(st_product_activities = "sourceA_prodB_purchase"))
|eval num_sourceA_prodB_purchase = if(isnull(num_sourceA_prodB_purchase), 0, num_sourceA_prodB_purchase)
|eval num_sourceA_prodB_return = mvcount(mvfilter(st_product_activities = "sourceA_prodB_return"))
|eval num_sourceA_prodB_return = if(isnull(num_sourceA_prodB_return), 0, num_sourceA_prodB_return)

|eval num_sourceB_prodA_subscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodA_subscribe"))
|eval num_sourceB_prodA_subscribe = if(isnull(num_sourceB_prodA_subscribe), 0, num_sourceB_prodA_subscribe)
|eval num_sourceB_prodA_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodA_unsubscribe"))
|eval num_sourceB_prodA_unsubscribe = if(isnull(num_sourceB_prodA_unsubscribe), 0, num_sourceB_prodA_unsubscribe)
|eval num_sourceB_prodA_purchase = mvcount(mvfilter(st_product_activities = "sourceB_prodA_purchase"))
|eval num_sourceB_prodA_purchase = if(isnull(num_sourceB_prodA_purchase), 0, num_sourceB_prodA_purchase)
|eval num_sourceB_prodA_return = mvcount(mvfilter(st_product_activities = "sourceB_prodA_return"))
|eval num_sourceB_prodA_return = if(isnull(num_sourceB_prodA_return),0, num_sourceB_prodA_return)

|eval num_sourceB_prodB_subscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodB_subscribe"))
|eval num_sourceB_prodB_subscribe = if(isnull(num_sourceB_prodB_subscribe), 0, num_sourceB_prodB_subscribe)
|eval num_sourceB_prodB_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodB_unsubscribe"))
|eval num_sourceB_prodB_unsubscribe = if(isnull(num_sourceB_prodB_unsubscribe), 0, num_sourceB_prodB_unsubscribe)
|eval num_sourceB_prodB_purchase = mvcount(mvfilter(st_product_activities = "sourceB_prodB_purchase"))
|eval num_sourceB_prodB_purchase = if(isnull(num_sourceB_prodB_purchase), 0, num_sourceB_prodB_purchase)
|eval num_sourceB_prodB_return = mvcount(mvfilter(st_product_activities = "sourceB_prodB_return"))
|eval num_sourceB_prodB_return = if(isnull(num_sourceB_prodB_return),0, num_sourceB_prodB_return)

Then, by order of precedence (highest precedence processed first at the top), we handle all our apples-and-oranges-to-kumquat cases by reducing the count of each cancelled-out event by 1 (because we are forced to use eval, the logic for this is queer and can be done many ways but I believe my method is the quickest, if perhaps not the most obvious):


|eval special_val=if((num_sourceB_prodA_subscribe>0 AND num_sourceB_prodB_purchase>0 AND num_sourceA_prodA_subscribe>0 AND num_sourceA_prodB_subscribe>0),1,0)
|eval num_sourceB_prodA_subscribe = num_sourceB_prodA_subscribe - special_val
|eval num_sourceB_prodB_purchase = num_sourceB_prodB_purchase - special_val
|eval num_sourceA_prodA_subscribe = num_sourceA_prodA_subscribe - special_val
|eval num_sourceA_prodB_subscribe = num_sourceA_prodB_subscribe - special_val
|eval special_val=if((num_sourceB_prodB_subscribe>0 AND num_sourceB_prodA_purchase>0 AND num_sourceA_prodB_subscribe>0 AND num_sourceA_prodA_subscribe>0),1,0)
|eval num_sourceB_prodB_subscribe = num_sourceB_prodB_subscribe - special_val
|eval num_sourceB_prodA_purchase = num_sourceB_prodA_purchase - special_val
|eval num_sourceA_prodB_subscribe = num_sourceA_prodB_subscribe - special_val
|eval num_sourceA_prodA_subscribe = num_sourceA_prodA_subscribe - special_val
|eval special_val=if((num_sourceB_prodA_unsubscribe>0 AND num_sourceB_prodB_return>0 AND num_sourceA_prodA_unsubscribe>0 AND num_sourceA_prodB_unsubscribe>0),1,0)
|eval num_sourceB_prodA_unsubscribe = num_sourceB_prodA_unsubscribe - special_val
|eval num_sourceB_prodB_return = num_sourceB_prodB_return - special_val
|eval num_sourceA_prodA_unsubscribe = num_sourceA_prodA_unsubscribe - special_val
|eval num_sourceA_prodB_unsubscribe = num_sourceA_prodB_unsubscribe - special_val
|eval special_val=if((num_sourceB_prodB_unsubscribe>0 AND num_sourceB_prodA_return>0 AND num_sourceA_prodB_unsubscribe>0 AND num_sourceA_prodA_unsubscribe>0),1,0)
|eval num_sourceB_prodB_unsubscribe = num_sourceB_prodB_unsubscribe - special_val
|eval num_sourceB_prodA_return = num_sourceB_prodA_return - special_val
|eval num_sourceA_prodB_unsubscribe = num_sourceA_prodB_unsubscribe - special_val
|eval num_sourceA_prodA_unsubscribe = num_sourceA_prodA_unsubscribe - special_val
|eval special_val=if((num_sourceA_prodB_return>0 AND num_sourceA_prodA_unsubscribe>0 AND num_sourceA_prodB_subscribe>0 AND num_sourceA_prodA_purchase>0),1,0)
|eval num_sourceA_prodB_return = num_sourceA_prodB_return - special_val
|eval num_sourceA_prodA_unsubscribe = num_sourceA_prodA_unsubscribe - special_val
|eval num_sourceA_prodB_subscribe = num_sourceA_prodB_subscribe - special_val
|eval num_sourceA_prodA_purchase = num_sourceA_prodA_purchase - special_val
|eval special_val=if((num_sourceA_prodA_return>0 AND num_sourceA_prodB_unsubscribe>0 AND num_sourceA_prodA_subscribe>0 AND num_sourceA_prodB_purchase>0),1,0)
|eval num_sourceA_prodA_return = num_sourceA_prodA_return - special_val
|eval num_sourceA_prodB_unsubscribe = num_sourceA_prodB_unsubscribe - special_val
|eval num_sourceA_prodA_subscribe = num_sourceA_prodA_subscribe - special_val
|eval num_sourceA_prodB_purchase = num_sourceA_prodB_purchase - special_val

Continuing, we handle the more obvious and straightforward cancellations/reductions by reducing the count of each cancelled-out event, one of them drops to zero and the other one drops to the difference of the 2 (again, because we are forced to use eval, the logic for this is queer and can be done many ways but I believe my method is the quickest, if perhaps not the most obvious):

First we eliminate exactly matching events in both data sets:


|eval num_sourceA_prodA_subscribe = max((num_sourceA_prodA_subscribe - num_sourceB_prodA_subscribe), 0)
|eval num_sourceB_prodA_subscribe = max((0 - num_sourceA_prodA_subscribe), 0)
|eval num_sourceA_prodA_unsubscribe = max((num_sourceA_prodA_unsubscribe - num_sourceB_prodA_unsubscribe), 0)
|eval num_sourceB_prodA_unsubscribe = max((0 - num_sourceA_prodA_unsubscribe), 0)
|eval num_sourceA_prodA_purchase = max((num_sourceA_prodA_purchase - num_sourceB_prodA_purchase), 0)
|eval num_sourceB_prodA_purchase = max((0 - num_sourceA_prodA_purchase), 0)
|eval num_sourceA_prodA_return = max((num_sourceA_prodA_return - num_sourceB_prodA_return), 0)
|eval num_sourceB_prodA_return = max((0 - num_sourceA_prodA_return), 0)

|eval num_sourceA_prodB_subscribe = max((num_sourceA_prodB_subscribe - num_sourceB_prodB_subscribe), 0)
|eval num_sourceB_prodB_subscribe = max((0 - num_sourceA_prodB_subscribe), 0)
|eval num_sourceA_prodB_unsubscribe = max((num_sourceA_prodB_unsubscribe - num_sourceB_prodB_unsubscribe), 0)
|eval num_sourceB_prodB_unsubscribe = max((0 - num_sourceA_prodB_unsubscribe), 0)
|eval num_sourceA_prodB_purchase = max((num_sourceA_prodB_purchase - num_sourceB_prodB_purchase), 0)
|eval num_sourceB_prodB_purchase = max((0 - num_sourceA_prodB_purchase), 0)
|eval num_sourceA_prodB_return = max((num_sourceA_prodB_return - num_sourceB_prodB_return), 0)
|eval num_sourceB_prodB_return = max((0 - num_sourceA_prodB_return), 0)

Next, we eliminate "do-nothing" pairs (e.g. a purchase + a refund) of events in each data set:


|eval num_sourceA_prodA_unsubscribe = max((num_sourceA_prodA_unsubscribe - num_sourceA_prodA_subscribe), 0)
|eval num_sourceA_prodA_subscribe = max((0 - num_sourceA_prodA_unsubscribe), 0)
|eval num_sourceA_prodA_return = max((num_sourceA_prodA_return - num_sourceA_prodA_purchase), 0)
|eval num_sourceA_prodA_purchase = max((0 - num_sourceA_prodA_return), 0)

|eval num_sourceA_prodB_unsubscribe = max((num_sourceA_prodB_unsubscribe - num_sourceA_prodB_subscribe), 0)
|eval num_sourceA_prodB_subscribe = max((0 - num_sourceA_prodB_unsubscribe), 0)
|eval num_sourceA_prodB_return = max((num_sourceA_prodB_return - num_sourceA_prodB_purchase), 0)
|eval num_sourceA_prodB_purchase = max((0 - num_sourceA_prodB_return), 0)

|eval num_sourceB_prodA_unsubscribe = max((num_sourceB_prodA_unsubscribe - num_sourceB_prodA_subscribe), 0)
|eval num_sourceB_prodA_subscribe = max((0 - num_sourceB_prodA_unsubscribe), 0)
|eval num_sourceB_prodA_return = max((num_sourceB_prodA_return - num_sourceB_prodA_purchase), 0)
|eval num_sourceB_prodA_purchase = max((0 - num_sourceB_prodA_return), 0)

|eval num_sourceB_prodB_unsubscribe = max((num_sourceB_prodB_unsubscribe - num_sourceB_prodB_subscribe), 0)
|eval num_sourceB_prodB_subscribe = max((0 - num_sourceB_prodB_unsubscribe), 0)
|eval num_sourceB_prodB_return = max((num_sourceB_prodB_return - num_sourceB_prodB_purchase), 0)
|eval num_sourceB_prodB_purchase = max((0 - num_sourceB_prodB_return), 0)

At this point, if our data sets were as they should be, every "num_*" variable should have a value of zero.

Last, we build a field with the cancelled out events missing by consulting the "num_" variables we have just manipulated. From this point forward, we only care if each variable is <=0 (don't add it) or >0 (add exactly 1). In other words, in my specific case, let's say we are selling surgeries, it does not make sense for a user to purchase 2 heart surgeries so if we show that there 10 purchases, these are phantom duplications and we only need to show that (at least) one was purchased and cancel out all the duplicates (dedup):


|eval st_product_activities_filtered = NULL

|eval st_product_activities_filtered = if((num_sourceA_prodA_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodA_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodA_purchase > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodA_return > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_return",NULL), st_product_activities_filtered)

|eval st_product_activities_filtered = if((num_sourceA_prodB_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodB_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodB_purchase > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodB_return > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_return",NULL), st_product_activities_filtered)

|eval st_product_activities_filtered = if((num_sourceB_prodA_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodA_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodA_purchase > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodA_return > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_return",NULL), st_product_activities_filtered)

|eval st_product_activities_filtered = if((num_sourceB_prodB_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodB_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodB_purchase > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodB_return > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_return",NULL), st_product_activities_filtered)

Wrapping up, we create a summary of "error types" noting that an "error type" of empty set will disappear from the "stats" restuls which normally is a problem and we take steps to avoid but in this case is a benefit because those are the users where everything cancelled out as it should have!


| fields - num*
| mvcombine st_product_activities_filtered
| stats count(user) AS users by st_product_activities_filtered

Special note, do not use "addcoltotals" because this will prevent you from being able to drill-down by clicking on a line in the search results.

View solution in original post

0 Karma

woodcock
Esteemed Legend

OK, I figured this out!

First, subsearches limit the output to only 100 values/events so this is out for a case like mine.

Second, the "set" functions (union/intersect) won't work for me because we have some special cases that aren't apple-to-apple; we need to say an apple and orange in one data set matches a kumquat in the other set.

So here is how I solved my problem:

First, after any other normalization, we create a new combination field and break this out by user:


|eval st_product_activity=sourcetype . "_" . product . "_" . activity
|stats list(st_product_activity) AS st_product_activities count by user

Next, we count the number of each type of event; because "mvcount" returns NULL instead of zero, we have to do some accommodating to ensure our "num_*" variables always contain an integer:


|eval num_sourceA_prodA_subscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodA_subscribe"))
|eval num_sourceA_prodA_subscribe = if(isnull(num_sourceA_prodA_subscribe), 0, num_sourceA_prodA_subscribe)
|eval num_sourceA_prodA_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodA_unsubscribe"))
|eval num_sourceA_prodA_unsubscribe = if(isnull(num_sourceA_prodA_unsubscribe), 0, num_sourceA_prodA_unsubscribe)
|eval num_sourceA_prodA_purchase = mvcount(mvfilter(st_product_activities = "sourceA_prodA_purchase"))
|eval num_sourceA_prodA_purchase = if(isnull(num_sourceA_prodA_purchase), 0, num_sourceA_prodA_purchase)
|eval num_sourceA_prodA_return = mvcount(mvfilter(st_product_activities = "sourceA_prodA_return"))
|eval num_sourceA_prodA_return = if(isnull(num_sourceA_prodA_return), 0, num_sourceA_prodA_return)

|eval num_sourceA_prodB_subscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodB_subscribe"))
|eval num_sourceA_prodB_subscribe = if(isnull(num_sourceA_prodB_subscribe), 0, num_sourceA_prodB_subscribe)
|eval num_sourceA_prodB_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceA_prodB_unsubscribe"))
|eval num_sourceA_prodB_unsubscribe = if(isnull(num_sourceA_prodB_unsubscribe), 0, num_sourceA_prodB_unsubscribe)
|eval num_sourceA_prodB_purchase = mvcount(mvfilter(st_product_activities = "sourceA_prodB_purchase"))
|eval num_sourceA_prodB_purchase = if(isnull(num_sourceA_prodB_purchase), 0, num_sourceA_prodB_purchase)
|eval num_sourceA_prodB_return = mvcount(mvfilter(st_product_activities = "sourceA_prodB_return"))
|eval num_sourceA_prodB_return = if(isnull(num_sourceA_prodB_return), 0, num_sourceA_prodB_return)

|eval num_sourceB_prodA_subscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodA_subscribe"))
|eval num_sourceB_prodA_subscribe = if(isnull(num_sourceB_prodA_subscribe), 0, num_sourceB_prodA_subscribe)
|eval num_sourceB_prodA_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodA_unsubscribe"))
|eval num_sourceB_prodA_unsubscribe = if(isnull(num_sourceB_prodA_unsubscribe), 0, num_sourceB_prodA_unsubscribe)
|eval num_sourceB_prodA_purchase = mvcount(mvfilter(st_product_activities = "sourceB_prodA_purchase"))
|eval num_sourceB_prodA_purchase = if(isnull(num_sourceB_prodA_purchase), 0, num_sourceB_prodA_purchase)
|eval num_sourceB_prodA_return = mvcount(mvfilter(st_product_activities = "sourceB_prodA_return"))
|eval num_sourceB_prodA_return = if(isnull(num_sourceB_prodA_return),0, num_sourceB_prodA_return)

|eval num_sourceB_prodB_subscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodB_subscribe"))
|eval num_sourceB_prodB_subscribe = if(isnull(num_sourceB_prodB_subscribe), 0, num_sourceB_prodB_subscribe)
|eval num_sourceB_prodB_unsubscribe = mvcount(mvfilter(st_product_activities = "sourceB_prodB_unsubscribe"))
|eval num_sourceB_prodB_unsubscribe = if(isnull(num_sourceB_prodB_unsubscribe), 0, num_sourceB_prodB_unsubscribe)
|eval num_sourceB_prodB_purchase = mvcount(mvfilter(st_product_activities = "sourceB_prodB_purchase"))
|eval num_sourceB_prodB_purchase = if(isnull(num_sourceB_prodB_purchase), 0, num_sourceB_prodB_purchase)
|eval num_sourceB_prodB_return = mvcount(mvfilter(st_product_activities = "sourceB_prodB_return"))
|eval num_sourceB_prodB_return = if(isnull(num_sourceB_prodB_return),0, num_sourceB_prodB_return)

Then, by order of precedence (highest precedence processed first at the top), we handle all our apples-and-oranges-to-kumquat cases by reducing the count of each cancelled-out event by 1 (because we are forced to use eval, the logic for this is queer and can be done many ways but I believe my method is the quickest, if perhaps not the most obvious):


|eval special_val=if((num_sourceB_prodA_subscribe>0 AND num_sourceB_prodB_purchase>0 AND num_sourceA_prodA_subscribe>0 AND num_sourceA_prodB_subscribe>0),1,0)
|eval num_sourceB_prodA_subscribe = num_sourceB_prodA_subscribe - special_val
|eval num_sourceB_prodB_purchase = num_sourceB_prodB_purchase - special_val
|eval num_sourceA_prodA_subscribe = num_sourceA_prodA_subscribe - special_val
|eval num_sourceA_prodB_subscribe = num_sourceA_prodB_subscribe - special_val
|eval special_val=if((num_sourceB_prodB_subscribe>0 AND num_sourceB_prodA_purchase>0 AND num_sourceA_prodB_subscribe>0 AND num_sourceA_prodA_subscribe>0),1,0)
|eval num_sourceB_prodB_subscribe = num_sourceB_prodB_subscribe - special_val
|eval num_sourceB_prodA_purchase = num_sourceB_prodA_purchase - special_val
|eval num_sourceA_prodB_subscribe = num_sourceA_prodB_subscribe - special_val
|eval num_sourceA_prodA_subscribe = num_sourceA_prodA_subscribe - special_val
|eval special_val=if((num_sourceB_prodA_unsubscribe>0 AND num_sourceB_prodB_return>0 AND num_sourceA_prodA_unsubscribe>0 AND num_sourceA_prodB_unsubscribe>0),1,0)
|eval num_sourceB_prodA_unsubscribe = num_sourceB_prodA_unsubscribe - special_val
|eval num_sourceB_prodB_return = num_sourceB_prodB_return - special_val
|eval num_sourceA_prodA_unsubscribe = num_sourceA_prodA_unsubscribe - special_val
|eval num_sourceA_prodB_unsubscribe = num_sourceA_prodB_unsubscribe - special_val
|eval special_val=if((num_sourceB_prodB_unsubscribe>0 AND num_sourceB_prodA_return>0 AND num_sourceA_prodB_unsubscribe>0 AND num_sourceA_prodA_unsubscribe>0),1,0)
|eval num_sourceB_prodB_unsubscribe = num_sourceB_prodB_unsubscribe - special_val
|eval num_sourceB_prodA_return = num_sourceB_prodA_return - special_val
|eval num_sourceA_prodB_unsubscribe = num_sourceA_prodB_unsubscribe - special_val
|eval num_sourceA_prodA_unsubscribe = num_sourceA_prodA_unsubscribe - special_val
|eval special_val=if((num_sourceA_prodB_return>0 AND num_sourceA_prodA_unsubscribe>0 AND num_sourceA_prodB_subscribe>0 AND num_sourceA_prodA_purchase>0),1,0)
|eval num_sourceA_prodB_return = num_sourceA_prodB_return - special_val
|eval num_sourceA_prodA_unsubscribe = num_sourceA_prodA_unsubscribe - special_val
|eval num_sourceA_prodB_subscribe = num_sourceA_prodB_subscribe - special_val
|eval num_sourceA_prodA_purchase = num_sourceA_prodA_purchase - special_val
|eval special_val=if((num_sourceA_prodA_return>0 AND num_sourceA_prodB_unsubscribe>0 AND num_sourceA_prodA_subscribe>0 AND num_sourceA_prodB_purchase>0),1,0)
|eval num_sourceA_prodA_return = num_sourceA_prodA_return - special_val
|eval num_sourceA_prodB_unsubscribe = num_sourceA_prodB_unsubscribe - special_val
|eval num_sourceA_prodA_subscribe = num_sourceA_prodA_subscribe - special_val
|eval num_sourceA_prodB_purchase = num_sourceA_prodB_purchase - special_val

Continuing, we handle the more obvious and straightforward cancellations/reductions by reducing the count of each cancelled-out event, one of them drops to zero and the other one drops to the difference of the 2 (again, because we are forced to use eval, the logic for this is queer and can be done many ways but I believe my method is the quickest, if perhaps not the most obvious):

First we eliminate exactly matching events in both data sets:


|eval num_sourceA_prodA_subscribe = max((num_sourceA_prodA_subscribe - num_sourceB_prodA_subscribe), 0)
|eval num_sourceB_prodA_subscribe = max((0 - num_sourceA_prodA_subscribe), 0)
|eval num_sourceA_prodA_unsubscribe = max((num_sourceA_prodA_unsubscribe - num_sourceB_prodA_unsubscribe), 0)
|eval num_sourceB_prodA_unsubscribe = max((0 - num_sourceA_prodA_unsubscribe), 0)
|eval num_sourceA_prodA_purchase = max((num_sourceA_prodA_purchase - num_sourceB_prodA_purchase), 0)
|eval num_sourceB_prodA_purchase = max((0 - num_sourceA_prodA_purchase), 0)
|eval num_sourceA_prodA_return = max((num_sourceA_prodA_return - num_sourceB_prodA_return), 0)
|eval num_sourceB_prodA_return = max((0 - num_sourceA_prodA_return), 0)

|eval num_sourceA_prodB_subscribe = max((num_sourceA_prodB_subscribe - num_sourceB_prodB_subscribe), 0)
|eval num_sourceB_prodB_subscribe = max((0 - num_sourceA_prodB_subscribe), 0)
|eval num_sourceA_prodB_unsubscribe = max((num_sourceA_prodB_unsubscribe - num_sourceB_prodB_unsubscribe), 0)
|eval num_sourceB_prodB_unsubscribe = max((0 - num_sourceA_prodB_unsubscribe), 0)
|eval num_sourceA_prodB_purchase = max((num_sourceA_prodB_purchase - num_sourceB_prodB_purchase), 0)
|eval num_sourceB_prodB_purchase = max((0 - num_sourceA_prodB_purchase), 0)
|eval num_sourceA_prodB_return = max((num_sourceA_prodB_return - num_sourceB_prodB_return), 0)
|eval num_sourceB_prodB_return = max((0 - num_sourceA_prodB_return), 0)

Next, we eliminate "do-nothing" pairs (e.g. a purchase + a refund) of events in each data set:


|eval num_sourceA_prodA_unsubscribe = max((num_sourceA_prodA_unsubscribe - num_sourceA_prodA_subscribe), 0)
|eval num_sourceA_prodA_subscribe = max((0 - num_sourceA_prodA_unsubscribe), 0)
|eval num_sourceA_prodA_return = max((num_sourceA_prodA_return - num_sourceA_prodA_purchase), 0)
|eval num_sourceA_prodA_purchase = max((0 - num_sourceA_prodA_return), 0)

|eval num_sourceA_prodB_unsubscribe = max((num_sourceA_prodB_unsubscribe - num_sourceA_prodB_subscribe), 0)
|eval num_sourceA_prodB_subscribe = max((0 - num_sourceA_prodB_unsubscribe), 0)
|eval num_sourceA_prodB_return = max((num_sourceA_prodB_return - num_sourceA_prodB_purchase), 0)
|eval num_sourceA_prodB_purchase = max((0 - num_sourceA_prodB_return), 0)

|eval num_sourceB_prodA_unsubscribe = max((num_sourceB_prodA_unsubscribe - num_sourceB_prodA_subscribe), 0)
|eval num_sourceB_prodA_subscribe = max((0 - num_sourceB_prodA_unsubscribe), 0)
|eval num_sourceB_prodA_return = max((num_sourceB_prodA_return - num_sourceB_prodA_purchase), 0)
|eval num_sourceB_prodA_purchase = max((0 - num_sourceB_prodA_return), 0)

|eval num_sourceB_prodB_unsubscribe = max((num_sourceB_prodB_unsubscribe - num_sourceB_prodB_subscribe), 0)
|eval num_sourceB_prodB_subscribe = max((0 - num_sourceB_prodB_unsubscribe), 0)
|eval num_sourceB_prodB_return = max((num_sourceB_prodB_return - num_sourceB_prodB_purchase), 0)
|eval num_sourceB_prodB_purchase = max((0 - num_sourceB_prodB_return), 0)

At this point, if our data sets were as they should be, every "num_*" variable should have a value of zero.

Last, we build a field with the cancelled out events missing by consulting the "num_" variables we have just manipulated. From this point forward, we only care if each variable is <=0 (don't add it) or >0 (add exactly 1). In other words, in my specific case, let's say we are selling surgeries, it does not make sense for a user to purchase 2 heart surgeries so if we show that there 10 purchases, these are phantom duplications and we only need to show that (at least) one was purchased and cancel out all the duplicates (dedup):


|eval st_product_activities_filtered = NULL

|eval st_product_activities_filtered = if((num_sourceA_prodA_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodA_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodA_purchase > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodA_return > 0),
mvappend(st_product_activities_filtered, "sourceA_prodA_return",NULL), st_product_activities_filtered)

|eval st_product_activities_filtered = if((num_sourceA_prodB_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodB_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodB_purchase > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceA_prodB_return > 0),
mvappend(st_product_activities_filtered, "sourceA_prodB_return",NULL), st_product_activities_filtered)

|eval st_product_activities_filtered = if((num_sourceB_prodA_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodA_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodA_purchase > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodA_return > 0),
mvappend(st_product_activities_filtered, "sourceB_prodA_return",NULL), st_product_activities_filtered)

|eval st_product_activities_filtered = if((num_sourceB_prodB_subscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodB_unsubscribe > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_subscribe",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodB_purchase > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_purchase",NULL), st_product_activities_filtered)
|eval st_product_activities_filtered = if((num_sourceB_prodB_return > 0),
mvappend(st_product_activities_filtered, "sourceB_prodB_return",NULL), st_product_activities_filtered)

Wrapping up, we create a summary of "error types" noting that an "error type" of empty set will disappear from the "stats" restuls which normally is a problem and we take steps to avoid but in this case is a benefit because those are the users where everything cancelled out as it should have!


| fields - num*
| mvcombine st_product_activities_filtered
| stats count(user) AS users by st_product_activities_filtered

Special note, do not use "addcoltotals" because this will prevent you from being able to drill-down by clicking on a line in the search results.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...