Solved: I need help joining two cvs (customers and purchas...

cromm · ‎07-25-2018

The two csv files I have are customers (fields= customerName,customerID,region,IsActive) with one row per customer and the second is Purchases (fields=customerName, customerID, productName, purchaseDate) with one row per purchase. I am trying to create a list of all the active customers in Northwest region and when they first purchased 5 specific products (the purchases csv include 100+ different products but I only care about these 5). Customers could have bought 0, 1, 2, 3, 4 or all 5 of the target products but after the join I want the customer record to show up five times (once for each product) regardless of it was purchased.

I was thinking of duplicating each customer in the Northwest 4 times and adding a product field with the 5 products (1 per row for each customer) but I don't know how to do that or if it makes sense.

currently this is what I have:
| from inputlookup:"customer.csv"
| where (Region="NW") AND (IsActive="True")
|join customerID type=left max=0
[|inputlookup Purchases.csv
|where (productName="Train" OR productName="Car" OR productName="Truck" OR productName="Bike" OR productName="Scooter")]

This gets me (Tom bought 2 products, Jill bought all 5 and Ken bought none):
customerName,customerID,region,IsActive,customerName, customerID, productName, purchaseDate
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7

Ken,3,NW,True,Ken,3,,,

What I want is (all customers to show up 5 times so Tom would have 3 rows without a purchase date, Jill would stay the same and Ken would have 5 rows without dates):
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Tom,1,NW,True,Tom,1,Truck,
Tom,1,NW,True,Tom,1,Bike,
Tom,1,NW,True,Tom,1,Scooter,
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7
Ken,3,NW,True,Ken,3,Train,
Ken,3,NW,True,Ken,3,Car,
Ken,3,NW,True,Ken,3,Truck,
Ken,3,NW,True,Ken,3,Bike,
Ken,3,NW,True,Ken,3,Scooter,

somesoni2 · ‎07-25-2018

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

View solution in original post

somesoni2 · ‎07-25-2018

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

cromm · ‎07-25-2018

Thanks!! This is almost perfect. There are some customers who bought two different versions of the same product and I only want most recent version. I tried to remove it with:

|eval CustProd= customerName+productName
|dedup UserCert sortby -purchaseDate

but it didn't work. Do you have any ideas of how to get rid of the extra values?

I need help joining two cvs (customers and purchases) that shows what was purchased and if certain products were not purchased

I was thinking of duplicating each customer in the Northwest 4 times and adding a product field with the 5 products (1 per row for each customer) but I don't know how to do that or if it makes sense.

Ken,3,NW,True,Ken,3,,,

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms