Splunk Search

I need help joining two cvs (customers and purchases) that shows what was purchased and if certain products were not purchased

cromm
Explorer

The two csv files I have are customers (fields= customerName,customerID,region,IsActive) with one row per customer and the second is Purchases (fields=customerName, customerID, productName, purchaseDate) with one row per purchase. I am trying to create a list of all the active customers in Northwest region and when they first purchased 5 specific products (the purchases csv include 100+ different products but I only care about these 5). Customers could have bought 0, 1, 2, 3, 4 or all 5 of the target products but after the join I want the customer record to show up five times (once for each product) regardless of it was purchased.

I was thinking of duplicating each customer in the Northwest 4 times and adding a product field with the 5 products (1 per row for each customer) but I don't know how to do that or if it makes sense.

currently this is what I have:
| from inputlookup:"customer.csv"
| where (Region="NW") AND (IsActive="True")
|join customerID type=left max=0
[|inputlookup Purchases.csv
|where (productName="Train" OR productName="Car" OR productName="Truck" OR productName="Bike" OR productName="Scooter")]

This gets me (Tom bought 2 products, Jill bought all 5 and Ken bought none):
customerName,customerID,region,IsActive,customerName, customerID, productName, purchaseDate
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7

Ken,3,NW,True,Ken,3,,,

What I want is (all customers to show up 5 times so Tom would have 3 rows without a purchase date, Jill would stay the same and Ken would have 5 rows without dates):
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Tom,1,NW,True,Tom,1,Truck,
Tom,1,NW,True,Tom,1,Bike,
Tom,1,NW,True,Tom,1,Scooter,
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7
Ken,3,NW,True,Ken,3,Train,
Ken,3,NW,True,Ken,3,Car,
Ken,3,NW,True,Ken,3,Truck,
Ken,3,NW,True,Ken,3,Bike,
Ken,3,NW,True,Ken,3,Scooter,

0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

View solution in original post

somesoni2
Revered Legend

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

cromm
Explorer

Thanks!! This is almost perfect. There are some customers who bought two different versions of the same product and I only want most recent version. I tried to remove it with:

|eval CustProd= customerName+productName
|dedup UserCert sortby -purchaseDate

but it didn't work. Do you have any ideas of how to get rid of the extra values?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...