Security

Time on websites (total session times)

aaronnicoli
Path Finder

Hi there,

We have as you would expect a bunch of firewall / content keeper logs in our splunk instance and or splunk guys wish to report on the time a user spends on each website (domain).

Basically, I am trying to see if there is any "easy"...ish, way of a determining a "session" for each domain and then adding them to display the the total time a user spends on each domain (roughly).

Lets say we start with a generic search against my firewall logs and a specific user.
Leaving us with an output of a single users requests in chronological order.

ANY help you could provide would be very very appreciated.

Thanks,
Aaron.

Tags (3)
0 Karma
1 Solution

Ayn
Legend

As you've already discussed it's hard to get really meaningful stats for the reasons cmeo outlines. But, it's certainly possible to create the stats based on the rules you suggested.

If using the firewall logs for this, I don't know exactly what fields are at your disposal - but let's say you have at least a source IP, a destination IP and a destination port. Our unique identifier for a certain web session could be based on these fields. In that case it's possible to build a transaction that joins separate events together to a new combined event (a transaction) based on rules that you specify. Upon creating a transaction, Splunk will write the time difference between its first and last event into a field called duration. What you do is create this transaction saying "join events having the same source IP, destination IP and port, but only if it's less than 30 minutes between one event and the next". Translated to a search, this would look something like:

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m

OK, now you have a bunch of transactions with corresponding duration fields that you need to sum together for each "session" to create a grand total. Use stats for this.

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
| stats sum(duration) AS session_time by src_ip,dest_ip,dest_port

This will give you a table with a list of "total session times" for each srcIP/destIP/destport pair that was found in your search, according to the rules you specified.

View solution in original post

carfi
Engager

You could by service say 80 or 443
but the max pause will still be an issue

Ayn
Legend

As you've already discussed it's hard to get really meaningful stats for the reasons cmeo outlines. But, it's certainly possible to create the stats based on the rules you suggested.

If using the firewall logs for this, I don't know exactly what fields are at your disposal - but let's say you have at least a source IP, a destination IP and a destination port. Our unique identifier for a certain web session could be based on these fields. In that case it's possible to build a transaction that joins separate events together to a new combined event (a transaction) based on rules that you specify. Upon creating a transaction, Splunk will write the time difference between its first and last event into a field called duration. What you do is create this transaction saying "join events having the same source IP, destination IP and port, but only if it's less than 30 minutes between one event and the next". Translated to a search, this would look something like:

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m

OK, now you have a bunch of transactions with corresponding duration fields that you need to sum together for each "session" to create a grand total. Use stats for this.

<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
| stats sum(duration) AS session_time by src_ip,dest_ip,dest_port

This will give you a table with a list of "total session times" for each srcIP/destIP/destport pair that was found in your search, according to the rules you specified.

aaronnicoli
Path Finder

Hmmm... I appear to have something happening that's not quite what I'm after.
Technically, the total time on a single domain should not be able to exceed the time period of the logs specificed.
Ie. I have a base search containing 3 days of logs, means I can't be on the site "google.com" for more than 3 days in total.
However, with this search... I am... about 27 days infact.
Is there no way of calculating this like how I mentioned earlier?
Basically so that the "period" spent on a site is calculated by an actual "timeout" value, rather than just assigning a period of time for every "hit".

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Another thing that would be useful is if webapp session cookies were logged when they are used (like J2EE JSESSIONID) -- then you could identify distinct user sessions according to the activity presented by that session ID

0 Karma

aaronnicoli
Path Finder

You have answered (and explained) absolutely everything I wanted!
Thank you so, so much!

I can now generate exactly what they're after.
Thank you!

0 Karma

aaronnicoli
Path Finder

The key would be session time, in other words lets say we make it a "magical" 30 minutes.

So, said user connects to a site, then 10 minutes later they connect again... another 5 minutes goes on and they connect once more... then three days later they reconnect and again 60 seconds later... that's it for the month.

This means they spent a total of 10 + 5 + 1 = 16 minutes on that site.

There's no way of even contemplating such a thing...?

0 Karma

cmeo
Contributor

I have had this same discussion with a customer some months ago. Here is what I sent them:


The problem I thought of with this is--what exactly are you measuring?
http is connectionless, so there isn't exactly a start and end of a
session to track...

I came up with some scenarios:

  1. User is interacting with a travel booking site. For the duration of
    their activities, there will be a stream of http traffic, puts and gets
    etc. No problem here.

  2. User opens a newspaper or mag and reads a long article. You might have one set
    of interactions as they get the page; they might sit there reading it
    for half an hour. You won't know anything until they browse the next web
    site. Alternatively, they might skim it in a minute and leave it open
    for half an hour in background. What, then, is the duration of their
    stay at the site?

  3. User opens multiple bookmarks in tabs but doesn't read any of them.
    Any traffic information here might be highly misleading; they might not
    in fact interact with any, but they could be open on the screen all day.

I don't think what you want to do can be done in a meaningful way--not with splunk anyway.

aaronnicoli
Path Finder

I completely agree and that's what I told the group in the first place.

However, they are keen to at least have some stats that can look shiny... no matter how pointless they truly are.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...