Knowledge Management

IIS, Visitor sessions & summary indexing - oh my!

stjack99
Explorer

I need help figuring out how to store visitor session info into a summary index.

First, what I want to be able to do: query the summary index and return how many visitor sessions our site had in x-time. Also, I want to return how many visitor sessions a particular path has /path/y with x-time.

I tried building a saved search that wrote to the summary index; web events transactioned by ip & user agent: host="webserver" | transaction ip UserAgent maxspan=30m | stats values(uri_stem) by ip UserAgent

This appeared to work until I noticed that each day had exactly 10,000 visitor sessions. The query was hitting a limit. This made me realize that dumping all of the data into the summary index doesn't actually gain me anything, since nothing is being summarized 🙂

The other way I thought about was to calculate the # of visitor sessions per day, and save that into the summary index. The problem here is that I would only be able to tell how many sessions there were for the whole site; I couldn't get sessions for just a subsection.

Any ideas on how to get both searches to work from the same summary index data? I don't want to have to setup a new summary index search every time someone thinks of something new to search for.

Thanks!

Tags (2)
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

You are correct in your observation that storing the full information about (user, uri_stem) pair isn't going to be much better than a search over your raw data. Another problem is that if you don't store distinct user ids, and you just store distinct counts, you can't combine several time periods (say hours in the day) to form a whole (the whole day) because there is overlap in the counts.

You should choose the time granularities that you need to report on and summarize distinct counts for each uri_stem and the site as a whole for that period, and persist this into the summary index.

Your search would be:

... | eval uid = ip + UserAgent | stats dc(uid) as visitors by uri_stem

Then reporting on this would be:

index=summary source=search_name | stats max(visitors) by uri_stem

If you want to look at the site as a whole, you can't combine the data by uri_stem, since there's overlap. However, you can save off a row in the summary for "ALL" as follows using multivalued fields:

... | eval uid = ip + UserAgent | eval uri_stem = uri_stem + " ALL" | makemv uri_stem | stats dc(uid) as visitors by uri_stem

Then you could report on the site using:

index=summary source=search_name uri_stem=ALL | stats max(visitors)

And by individual paths using:

index=summary source=search_name uri_stem!=ALL | stats max(visitors) by uri_stem

View solution in original post

Stephen_Sorkin
Splunk Employee
Splunk Employee

You are correct in your observation that storing the full information about (user, uri_stem) pair isn't going to be much better than a search over your raw data. Another problem is that if you don't store distinct user ids, and you just store distinct counts, you can't combine several time periods (say hours in the day) to form a whole (the whole day) because there is overlap in the counts.

You should choose the time granularities that you need to report on and summarize distinct counts for each uri_stem and the site as a whole for that period, and persist this into the summary index.

Your search would be:

... | eval uid = ip + UserAgent | stats dc(uid) as visitors by uri_stem

Then reporting on this would be:

index=summary source=search_name | stats max(visitors) by uri_stem

If you want to look at the site as a whole, you can't combine the data by uri_stem, since there's overlap. However, you can save off a row in the summary for "ALL" as follows using multivalued fields:

... | eval uid = ip + UserAgent | eval uri_stem = uri_stem + " ALL" | makemv uri_stem | stats dc(uid) as visitors by uri_stem

Then you could report on the site using:

index=summary source=search_name uri_stem=ALL | stats max(visitors)

And by individual paths using:

index=summary source=search_name uri_stem!=ALL | stats max(visitors) by uri_stem
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...