Splunk Search

Rank data from web access files

ndoshi
Splunk Employee
Splunk Employee

I have web content (articles, stories) where each article is grouped in a category such as NEWS, STORY, etc. Website visitors are grouped by region. In each region, I want to be able to rank each category by the number of site visitors who read articles in a category.

I can get a count by region and category.

I can get a count by Region, VisitorID, Category.

However, I want to know how many site visitors had CAT1 as their most-read category. How many had CAT1 as their second most-read category? How many had CAT2 as their most-read category?

Here's an example:

**Region    VID      Category   # Visitors who ranked this 1st**
NY           87        STORY               10
NY           44        STORY                9
LA           98        NEWS                 4
1 Solution

eelisio2
Path Finder

Try aggregating by region, visitor id, and classification. Then, use streamstats to create the rank field.

sourcetype="mydata" geo_region=XXX | eval Classification=case(like(pagename, "home%"), "HOME",like(pagename, "%news%") AND like(pagename, "%story%"), "STORY",like(pagename, "%markets%"), "MARKETS",like(pagename, "%personalfinance%"), "PERSONALFINANCE",like(pagename, "%search%"), "SEARCH",like(pagename, "%news%"), "NEWS")|stats count as hitcount by geo_region, visid, Classification | sort geo_region,visid,-hitcount | streamstats count as rank by visid | stats count as rankcount by geo_region, Classification, rank

Breaking down the search command, here are the results from each section:

stats count as hitcount by geo_region, visid, Classification

geo_region      visid   Classification          hitcount
CA              100     HOME                    5
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              200     NEWS                    5
CA              200     HOME                    10

You want to sort in order of the number of hits most to fewest
sort geo_region,visid,-hitcount

geo_region      visid   Classification          hitcount
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              100     HOME                    5
CA              200     HOME                    10
CA              200     NEWS                    15

Use streamstats to count the number of rows for each visitor id. By starting with 1 for each visitor id, this creates a rank field.

streamstats count as rank by visid

geo_region      visid   Classification          hitcount        rank
CA              100     NEWS                    10              1
CA              100     MARKETS                 7               2
CA              100     HOME                    5               3
CA              200     NEWS                    15              1
CA              200     HOME                    10              2

Then, count the number of rows per Classification and rank

stats count as rankcount by geo_region, Classification, rank

Classification          rank            rankcount
NEWS                    1               2
MARKETS                 2               1
HOME                    2               1
HOME                    3               1

View solution in original post

eelisio2
Path Finder

Try aggregating by region, visitor id, and classification. Then, use streamstats to create the rank field.

sourcetype="mydata" geo_region=XXX | eval Classification=case(like(pagename, "home%"), "HOME",like(pagename, "%news%") AND like(pagename, "%story%"), "STORY",like(pagename, "%markets%"), "MARKETS",like(pagename, "%personalfinance%"), "PERSONALFINANCE",like(pagename, "%search%"), "SEARCH",like(pagename, "%news%"), "NEWS")|stats count as hitcount by geo_region, visid, Classification | sort geo_region,visid,-hitcount | streamstats count as rank by visid | stats count as rankcount by geo_region, Classification, rank

Breaking down the search command, here are the results from each section:

stats count as hitcount by geo_region, visid, Classification

geo_region      visid   Classification          hitcount
CA              100     HOME                    5
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              200     NEWS                    5
CA              200     HOME                    10

You want to sort in order of the number of hits most to fewest
sort geo_region,visid,-hitcount

geo_region      visid   Classification          hitcount
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              100     HOME                    5
CA              200     HOME                    10
CA              200     NEWS                    15

Use streamstats to count the number of rows for each visitor id. By starting with 1 for each visitor id, this creates a rank field.

streamstats count as rank by visid

geo_region      visid   Classification          hitcount        rank
CA              100     NEWS                    10              1
CA              100     MARKETS                 7               2
CA              100     HOME                    5               3
CA              200     NEWS                    15              1
CA              200     HOME                    10              2

Then, count the number of rows per Classification and rank

stats count as rankcount by geo_region, Classification, rank

Classification          rank            rankcount
NEWS                    1               2
MARKETS                 2               1
HOME                    2               1
HOME                    3               1
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...