I have web content (articles, stories) where each article is grouped in a category such as NEWS, STORY, etc. Website visitors are grouped by region. In each region, I want to be able to rank each category by the number of site visitors who read articles in a category.
I can get a count by region and category.
I can get a count by Region, VisitorID, Category.
However, I want to know how many site visitors had CAT1 as their most-read category. How many had CAT1 as their second most-read category? How many had CAT2 as their most-read category?
Here's an example:
**Region VID Category # Visitors who ranked this 1st**
NY 87 STORY 10
NY 44 STORY 9
LA 98 NEWS 4
Try aggregating by region, visitor id, and classification. Then, use streamstats to create the rank field.
sourcetype="mydata" geo_region=XXX | eval Classification=case(like(pagename, "home%"), "HOME",like(pagename, "%news%") AND like(pagename, "%story%"), "STORY",like(pagename, "%markets%"), "MARKETS",like(pagename, "%personalfinance%"), "PERSONALFINANCE",like(pagename, "%search%"), "SEARCH",like(pagename, "%news%"), "NEWS")|stats count as hitcount by geo_region, visid, Classification | sort geo_region,visid,-hitcount | streamstats count as rank by visid | stats count as rankcount by geo_region, Classification, rank
Breaking down the search command, here are the results from each section:
stats count as hitcount by geo_region, visid, Classification
geo_region visid Classification hitcount
CA 100 HOME 5
CA 100 NEWS 10
CA 100 MARKETS 7
CA 200 NEWS 5
CA 200 HOME 10
You want to sort in order of the number of hits most to fewest
sort geo_region,visid,-hitcount
geo_region visid Classification hitcount
CA 100 NEWS 10
CA 100 MARKETS 7
CA 100 HOME 5
CA 200 HOME 10
CA 200 NEWS 15
Use streamstats to count the number of rows for each visitor id. By starting with 1 for each visitor id, this creates a rank field.
streamstats count as rank by visid
geo_region visid Classification hitcount rank
CA 100 NEWS 10 1
CA 100 MARKETS 7 2
CA 100 HOME 5 3
CA 200 NEWS 15 1
CA 200 HOME 10 2
Then, count the number of rows per Classification and rank
stats count as rankcount by geo_region, Classification, rank
Classification rank rankcount
NEWS 1 2
MARKETS 2 1
HOME 2 1
HOME 3 1
Try aggregating by region, visitor id, and classification. Then, use streamstats to create the rank field.
sourcetype="mydata" geo_region=XXX | eval Classification=case(like(pagename, "home%"), "HOME",like(pagename, "%news%") AND like(pagename, "%story%"), "STORY",like(pagename, "%markets%"), "MARKETS",like(pagename, "%personalfinance%"), "PERSONALFINANCE",like(pagename, "%search%"), "SEARCH",like(pagename, "%news%"), "NEWS")|stats count as hitcount by geo_region, visid, Classification | sort geo_region,visid,-hitcount | streamstats count as rank by visid | stats count as rankcount by geo_region, Classification, rank
Breaking down the search command, here are the results from each section:
stats count as hitcount by geo_region, visid, Classification
geo_region visid Classification hitcount
CA 100 HOME 5
CA 100 NEWS 10
CA 100 MARKETS 7
CA 200 NEWS 5
CA 200 HOME 10
You want to sort in order of the number of hits most to fewest
sort geo_region,visid,-hitcount
geo_region visid Classification hitcount
CA 100 NEWS 10
CA 100 MARKETS 7
CA 100 HOME 5
CA 200 HOME 10
CA 200 NEWS 15
Use streamstats to count the number of rows for each visitor id. By starting with 1 for each visitor id, this creates a rank field.
streamstats count as rank by visid
geo_region visid Classification hitcount rank
CA 100 NEWS 10 1
CA 100 MARKETS 7 2
CA 100 HOME 5 3
CA 200 NEWS 15 1
CA 200 HOME 10 2
Then, count the number of rows per Classification and rank
stats count as rankcount by geo_region, Classification, rank
Classification rank rankcount
NEWS 1 2
MARKETS 2 1
HOME 2 1
HOME 3 1