I have a very large set of retail data. The significant fields for this query are store_no, transaction_amt, zip, ethnicity.
I am attempting to get a report for a single store the percentage of sales amounts by ethnicity for each zip code.
So what I am aiming for is a result set like this:
46208 46219 46222
"A: African American" .25 .23 .40
"B: Asian" .06 .10 .14
"C: Hispanic" .21 .22 .35
"D: White" .48 .45 .11
So far I have been able to get the total sales for each ethnicity at a store, but I don't know how to report out on the zip code values.
Here is what I have so far:
source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg"
| where (store_no = 243)
| search ethnicity!="E: Unknown"
| stats
sum(transaction_amt) as TotalSales
sum(eval(if(ethnicity="A: African American", transaction_amt, null()))) as TotalAfricanAmericanSales
sum(eval(if(ethnicity="B: Asian", transaction_amt, null()))) as TotalAsianSales
sum(eval(if(ethnicity="C: Hispanic", transaction_amt, null()))) as TotalHispanicSales
sum(eval(if(ethnicity="D: White", transaction_amt, null()))) as TotalWhiteSales
count as TotalCount
dc(zip) as dcZip
by store_no
| eval PercentAfricanAmericanSales=round(TotalAfricanAmericanSales/TotalSales, 6) * 100
| eval PercentAsianSales=round(TotalAsianSales/TotalSales, 6) * 100
| eval PercentHispanicSales=round(TotalHispanicSales/TotalSales, 6) * 100
| eval PercentWhiteSales=round(TotalWhiteSales/TotalSales, 6) * 100
| table TotalSales, TotalAfricanAmericanSales, PercentAfricanAmericanSales, TotalAsianSales, PercentAsianSales, TotalHispanicSales, PercentHispanicSales, TotalWhiteSales, PercentWhiteSales, TotalCount, dcZip
Thank you and please let me know if I can clarify what I am asking.
@behudelson,
1) Your first should where and search should be merged in the base search for filtering required results upfront.
2) If your final output requires transaction amount by zip and ethnicity, store number is not required for stats.
Since we need statistics by zip code and then output by ethnicity, I have used eventstats
and foreach
to calculate percent by zip and then did a transpose
to reverse axis in the end.
source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg" store_no="243" ethnicity!="E: Unknown"
| stats sum(transaction_amt) as transaction_amt by zip ethnicity
| eventstats sum(transaction_amt) as Total by zip
| chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
| rename "Total: A: African American" as Total
| fields - "Total: *"
| foreach "transaction_amt: *"
[eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
| fields zip perc*
| rename "perc: *" as *
| transpose header_field="zip" column_name="ethnicity"
Following is a run anywhere search to test out (makeresults with append command mock data as per the question):
| makeresults
| eval ethnicity="A: African American", zip=46208, transaction_amt=2000
| append
[| makeresults
| eval ethnicity="A: African American", zip=46219, transaction_amt=1500]
| append
[| makeresults
| eval ethnicity="A: African American", zip=46222, transaction_amt=2500]
| append
[| makeresults
| eval ethnicity="B: Asian", zip=46208, transaction_amt=1100]
| append
[| makeresults
| eval ethnicity="B: Asian", zip=46219, transaction_amt=1300]
| append
[| makeresults
| eval ethnicity="B: Asian", zip=46222, transaction_amt=2500]
| append
[| makeresults
| eval ethnicity="C: Hispanic", zip=46208, transaction_amt=1200]
| append
[| makeresults
| eval ethnicity="C: Hispanic", zip=46219, transaction_amt=2000]
| append
[| makeresults
| eval ethnicity="C: Hispanic", zip=46222, transaction_amt=2300]
| append
[| makeresults
| eval ethnicity="D: White", zip=46208, transaction_amt=1800]
| append
[| makeresults
| eval ethnicity="D: White", zip=46219, transaction_amt=1570]
| append
[| makeresults
| eval ethnicity="D: White", zip=46222, transaction_amt=1350]
| stats sum(transaction_amt) as transaction_amt by zip ethnicity
| eventstats sum(transaction_amt) as Total by zip
| chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
| rename "Total: A: African American" as Total
| fields - "Total: *"
| foreach "transaction_amt: *"
[eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
| fields zip perc*
| rename "perc: *" as *
| transpose header_field="zip" column_name="ethnicity"
@behudelson,
1) Your first should where and search should be merged in the base search for filtering required results upfront.
2) If your final output requires transaction amount by zip and ethnicity, store number is not required for stats.
Since we need statistics by zip code and then output by ethnicity, I have used eventstats
and foreach
to calculate percent by zip and then did a transpose
to reverse axis in the end.
source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg" store_no="243" ethnicity!="E: Unknown"
| stats sum(transaction_amt) as transaction_amt by zip ethnicity
| eventstats sum(transaction_amt) as Total by zip
| chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
| rename "Total: A: African American" as Total
| fields - "Total: *"
| foreach "transaction_amt: *"
[eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
| fields zip perc*
| rename "perc: *" as *
| transpose header_field="zip" column_name="ethnicity"
Following is a run anywhere search to test out (makeresults with append command mock data as per the question):
| makeresults
| eval ethnicity="A: African American", zip=46208, transaction_amt=2000
| append
[| makeresults
| eval ethnicity="A: African American", zip=46219, transaction_amt=1500]
| append
[| makeresults
| eval ethnicity="A: African American", zip=46222, transaction_amt=2500]
| append
[| makeresults
| eval ethnicity="B: Asian", zip=46208, transaction_amt=1100]
| append
[| makeresults
| eval ethnicity="B: Asian", zip=46219, transaction_amt=1300]
| append
[| makeresults
| eval ethnicity="B: Asian", zip=46222, transaction_amt=2500]
| append
[| makeresults
| eval ethnicity="C: Hispanic", zip=46208, transaction_amt=1200]
| append
[| makeresults
| eval ethnicity="C: Hispanic", zip=46219, transaction_amt=2000]
| append
[| makeresults
| eval ethnicity="C: Hispanic", zip=46222, transaction_amt=2300]
| append
[| makeresults
| eval ethnicity="D: White", zip=46208, transaction_amt=1800]
| append
[| makeresults
| eval ethnicity="D: White", zip=46219, transaction_amt=1570]
| append
[| makeresults
| eval ethnicity="D: White", zip=46222, transaction_amt=1350]
| stats sum(transaction_amt) as transaction_amt by zip ethnicity
| eventstats sum(transaction_amt) as Total by zip
| chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
| rename "Total: A: African American" as Total
| fields - "Total: *"
| foreach "transaction_amt: *"
[eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
| fields zip perc*
| rename "perc: *" as *
| transpose header_field="zip" column_name="ethnicity"
Hi @niketnilay, I will study this and follow up if I have questions. Thank you for such a quick and detailed response - very cool!
@behudelson, second search is a run anywhere search. From the | stats
command onward you can try out one command at a time to see and understand how each one of them behave.
You should also refer to Splunk Docs for each command used for reference. For example, transpose
command has a default limit of 5. You will definitely have more zip codes than that, so you may have to use transpose with required limit.
| transpose 5 header_field="zip" column_name="ethnicity"
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose
Do let us know if you have any questions and accept the Answer once your issue is resolved 🙂