Splunk Search

How to get percentages of value pairs

behudelson
Path Finder

I have a very large set of retail data. The significant fields for this query are store_no, transaction_amt, zip, ethnicity.

I am attempting to get a report for a single store the percentage of sales amounts by ethnicity for each zip code.

So what I am aiming for is a result set like this:

                        46208      46219       46222
"A: African American"   .25         .23         .40
"B: Asian"              .06         .10         .14
"C: Hispanic"           .21         .22         .35
"D: White"              .48         .45         .11

So far I have been able to get the total sales for each ethnicity at a store, but I don't know how to report out on the zip code values.

Here is what I have so far:

source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg"  
| where (store_no = 243) 
| search ethnicity!="E: Unknown"
| stats 
  sum(transaction_amt) as TotalSales 
  sum(eval(if(ethnicity="A: African American", transaction_amt, null()))) as TotalAfricanAmericanSales
  sum(eval(if(ethnicity="B: Asian", transaction_amt, null()))) as TotalAsianSales
  sum(eval(if(ethnicity="C: Hispanic", transaction_amt, null()))) as TotalHispanicSales
  sum(eval(if(ethnicity="D: White", transaction_amt, null()))) as TotalWhiteSales  
  count as TotalCount
  dc(zip) as dcZip
  by store_no

| eval PercentAfricanAmericanSales=round(TotalAfricanAmericanSales/TotalSales, 6) * 100
| eval PercentAsianSales=round(TotalAsianSales/TotalSales, 6) * 100 
| eval PercentHispanicSales=round(TotalHispanicSales/TotalSales, 6) * 100 
| eval PercentWhiteSales=round(TotalWhiteSales/TotalSales, 6) * 100

| table TotalSales, TotalAfricanAmericanSales, PercentAfricanAmericanSales, TotalAsianSales, PercentAsianSales, TotalHispanicSales, PercentHispanicSales, TotalWhiteSales, PercentWhiteSales, TotalCount, dcZip

Thank you and please let me know if I can clarify what I am asking.

Tags (1)
0 Karma
1 Solution

niketn
Legend

@behudelson,

1) Your first should where and search should be merged in the base search for filtering required results upfront.
2) If your final output requires transaction amount by zip and ethnicity, store number is not required for stats.

Since we need statistics by zip code and then output by ethnicity, I have used eventstats and foreach to calculate percent by zip and then did a transpose to reverse axis in the end.

 source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg" store_no="243" ethnicity!="E: Unknown"
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"

Following is a run anywhere search to test out (makeresults with append command mock data as per the question):

|  makeresults
|  eval ethnicity="A: African American", zip=46208, transaction_amt=2000
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46219, transaction_amt=1500]
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46208, transaction_amt=1100]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46219, transaction_amt=1300]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46208, transaction_amt=1200]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46219, transaction_amt=2000]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46222, transaction_amt=2300]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46208, transaction_amt=1800]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46219, transaction_amt=1570]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46222, transaction_amt=1350]
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn
Legend

@behudelson,

1) Your first should where and search should be merged in the base search for filtering required results upfront.
2) If your final output requires transaction amount by zip and ethnicity, store number is not required for stats.

Since we need statistics by zip code and then output by ethnicity, I have used eventstats and foreach to calculate percent by zip and then did a transpose to reverse axis in the end.

 source="hhg-butler-data.csv" host="jw0mt32" sourcetype="hhg" store_no="243" ethnicity!="E: Unknown"
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"

Following is a run anywhere search to test out (makeresults with append command mock data as per the question):

|  makeresults
|  eval ethnicity="A: African American", zip=46208, transaction_amt=2000
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46219, transaction_amt=1500]
|  append 
    [|  makeresults
|  eval ethnicity="A: African American", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46208, transaction_amt=1100]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46219, transaction_amt=1300]
|  append 
    [|  makeresults
|  eval ethnicity="B: Asian", zip=46222, transaction_amt=2500]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46208, transaction_amt=1200]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46219, transaction_amt=2000]
|  append 
    [|  makeresults
|  eval ethnicity="C: Hispanic", zip=46222, transaction_amt=2300]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46208, transaction_amt=1800]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46219, transaction_amt=1570]
|  append 
    [|  makeresults
|  eval ethnicity="D: White", zip=46222, transaction_amt=1350]
|  stats sum(transaction_amt) as transaction_amt by zip ethnicity
|  eventstats sum(transaction_amt) as Total by zip
|  chart values(Total) as Total sum(transaction_amt) as transaction_amt by zip ethnicity
|  rename "Total: A: African American" as Total
|  fields - "Total: *"
|  foreach "transaction_amt: *" 
    [eval "perc: <<MATCHSTR>>"=round(('<<FIELD>>'/Total)*100,1)]
|  fields zip perc*
|  rename "perc: *" as *
|  transpose header_field="zip" column_name="ethnicity"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

behudelson
Path Finder

Hi @niketnilay, I will study this and follow up if I have questions. Thank you for such a quick and detailed response - very cool!

0 Karma

niketn
Legend

@behudelson, second search is a run anywhere search. From the | stats command onward you can try out one command at a time to see and understand how each one of them behave.

You should also refer to Splunk Docs for each command used for reference. For example, transpose command has a default limit of 5. You will definitely have more zip codes than that, so you may have to use transpose with required limit.

 |  transpose 5 header_field="zip" column_name="ethnicity"

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose

Do let us know if you have any questions and accept the Answer once your issue is resolved 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...