Splunk Search

Rank data from web access files

ndoshi
Splunk Employee
Splunk Employee

I have web content (articles, stories) where each article is grouped in a category such as NEWS, STORY, etc. Website visitors are grouped by region. In each region, I want to be able to rank each category by the number of site visitors who read articles in a category.

I can get a count by region and category.

I can get a count by Region, VisitorID, Category.

However, I want to know how many site visitors had CAT1 as their most-read category. How many had CAT1 as their second most-read category? How many had CAT2 as their most-read category?

Here's an example:

**Region    VID      Category   # Visitors who ranked this 1st**
NY           87        STORY               10
NY           44        STORY                9
LA           98        NEWS                 4
1 Solution

eelisio2
Path Finder

Try aggregating by region, visitor id, and classification. Then, use streamstats to create the rank field.

sourcetype="mydata" geo_region=XXX | eval Classification=case(like(pagename, "home%"), "HOME",like(pagename, "%news%") AND like(pagename, "%story%"), "STORY",like(pagename, "%markets%"), "MARKETS",like(pagename, "%personalfinance%"), "PERSONALFINANCE",like(pagename, "%search%"), "SEARCH",like(pagename, "%news%"), "NEWS")|stats count as hitcount by geo_region, visid, Classification | sort geo_region,visid,-hitcount | streamstats count as rank by visid | stats count as rankcount by geo_region, Classification, rank

Breaking down the search command, here are the results from each section:

stats count as hitcount by geo_region, visid, Classification

geo_region      visid   Classification          hitcount
CA              100     HOME                    5
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              200     NEWS                    5
CA              200     HOME                    10

You want to sort in order of the number of hits most to fewest
sort geo_region,visid,-hitcount

geo_region      visid   Classification          hitcount
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              100     HOME                    5
CA              200     HOME                    10
CA              200     NEWS                    15

Use streamstats to count the number of rows for each visitor id. By starting with 1 for each visitor id, this creates a rank field.

streamstats count as rank by visid

geo_region      visid   Classification          hitcount        rank
CA              100     NEWS                    10              1
CA              100     MARKETS                 7               2
CA              100     HOME                    5               3
CA              200     NEWS                    15              1
CA              200     HOME                    10              2

Then, count the number of rows per Classification and rank

stats count as rankcount by geo_region, Classification, rank

Classification          rank            rankcount
NEWS                    1               2
MARKETS                 2               1
HOME                    2               1
HOME                    3               1

View solution in original post

eelisio2
Path Finder

Try aggregating by region, visitor id, and classification. Then, use streamstats to create the rank field.

sourcetype="mydata" geo_region=XXX | eval Classification=case(like(pagename, "home%"), "HOME",like(pagename, "%news%") AND like(pagename, "%story%"), "STORY",like(pagename, "%markets%"), "MARKETS",like(pagename, "%personalfinance%"), "PERSONALFINANCE",like(pagename, "%search%"), "SEARCH",like(pagename, "%news%"), "NEWS")|stats count as hitcount by geo_region, visid, Classification | sort geo_region,visid,-hitcount | streamstats count as rank by visid | stats count as rankcount by geo_region, Classification, rank

Breaking down the search command, here are the results from each section:

stats count as hitcount by geo_region, visid, Classification

geo_region      visid   Classification          hitcount
CA              100     HOME                    5
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              200     NEWS                    5
CA              200     HOME                    10

You want to sort in order of the number of hits most to fewest
sort geo_region,visid,-hitcount

geo_region      visid   Classification          hitcount
CA              100     NEWS                    10
CA              100     MARKETS                 7
CA              100     HOME                    5
CA              200     HOME                    10
CA              200     NEWS                    15

Use streamstats to count the number of rows for each visitor id. By starting with 1 for each visitor id, this creates a rank field.

streamstats count as rank by visid

geo_region      visid   Classification          hitcount        rank
CA              100     NEWS                    10              1
CA              100     MARKETS                 7               2
CA              100     HOME                    5               3
CA              200     NEWS                    15              1
CA              200     HOME                    10              2

Then, count the number of rows per Classification and rank

stats count as rankcount by geo_region, Classification, rank

Classification          rank            rankcount
NEWS                    1               2
MARKETS                 2               1
HOME                    2               1
HOME                    3               1
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...