Hi there,
We have as you would expect a bunch of firewall / content keeper logs in our splunk instance and or splunk guys wish to report on the time a user spends on each website (domain).
Basically, I am trying to see if there is any "easy"...ish, way of a determining a "session" for each domain and then adding them to display the the total time a user spends on each domain (roughly).
Lets say we start with a generic search against my firewall logs and a specific user.
Leaving us with an output of a single users requests in chronological order.
ANY help you could provide would be very very appreciated.
Thanks,
Aaron.
As you've already discussed it's hard to get really meaningful stats for the reasons cmeo outlines. But, it's certainly possible to create the stats based on the rules you suggested.
If using the firewall logs for this, I don't know exactly what fields are at your disposal - but let's say you have at least a source IP, a destination IP and a destination port. Our unique identifier for a certain web session could be based on these fields. In that case it's possible to build a transaction that joins separate events together to a new combined event (a transaction) based on rules that you specify. Upon creating a transaction, Splunk will write the time difference between its first and last event into a field called duration
. What you do is create this transaction saying "join events having the same source IP, destination IP and port, but only if it's less than 30 minutes between one event and the next". Translated to a search, this would look something like:
<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
OK, now you have a bunch of transactions with corresponding duration fields that you need to sum together for each "session" to create a grand total. Use stats
for this.
<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
| stats sum(duration) AS session_time by src_ip,dest_ip,dest_port
This will give you a table with a list of "total session times" for each srcIP/destIP/destport pair that was found in your search, according to the rules you specified.
You could by service say 80 or 443
but the max pause will still be an issue
As you've already discussed it's hard to get really meaningful stats for the reasons cmeo outlines. But, it's certainly possible to create the stats based on the rules you suggested.
If using the firewall logs for this, I don't know exactly what fields are at your disposal - but let's say you have at least a source IP, a destination IP and a destination port. Our unique identifier for a certain web session could be based on these fields. In that case it's possible to build a transaction that joins separate events together to a new combined event (a transaction) based on rules that you specify. Upon creating a transaction, Splunk will write the time difference between its first and last event into a field called duration
. What you do is create this transaction saying "join events having the same source IP, destination IP and port, but only if it's less than 30 minutes between one event and the next". Translated to a search, this would look something like:
<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
OK, now you have a bunch of transactions with corresponding duration fields that you need to sum together for each "session" to create a grand total. Use stats
for this.
<yourbasesearch>
| transaction src_ip dest_ip dest_port maxpause=30m
| stats sum(duration) AS session_time by src_ip,dest_ip,dest_port
This will give you a table with a list of "total session times" for each srcIP/destIP/destport pair that was found in your search, according to the rules you specified.
Hmmm... I appear to have something happening that's not quite what I'm after.
Technically, the total time on a single domain should not be able to exceed the time period of the logs specificed.
Ie. I have a base search containing 3 days of logs, means I can't be on the site "google.com" for more than 3 days in total.
However, with this search... I am... about 27 days infact.
Is there no way of calculating this like how I mentioned earlier?
Basically so that the "period" spent on a site is calculated by an actual "timeout" value, rather than just assigning a period of time for every "hit".
Another thing that would be useful is if webapp session cookies were logged when they are used (like J2EE JSESSIONID) -- then you could identify distinct user sessions according to the activity presented by that session ID
You have answered (and explained) absolutely everything I wanted!
Thank you so, so much!
I can now generate exactly what they're after.
Thank you!
The key would be session time, in other words lets say we make it a "magical" 30 minutes.
So, said user connects to a site, then 10 minutes later they connect again... another 5 minutes goes on and they connect once more... then three days later they reconnect and again 60 seconds later... that's it for the month.
This means they spent a total of 10 + 5 + 1 = 16 minutes on that site.
There's no way of even contemplating such a thing...?
I have had this same discussion with a customer some months ago. Here is what I sent them:
The problem I thought of with this is--what exactly are you measuring?
http is connectionless, so there isn't exactly a start and end of a
session to track...
I came up with some scenarios:
User is interacting with a travel booking site. For the duration of
their activities, there will be a stream of http traffic, puts and gets
etc. No problem here.
User opens a newspaper or mag and reads a long article. You might have one set
of interactions as they get the page; they might sit there reading it
for half an hour. You won't know anything until they browse the next web
site. Alternatively, they might skim it in a minute and leave it open
for half an hour in background. What, then, is the duration of their
stay at the site?
User opens multiple bookmarks in tabs but doesn't read any of them.
Any traffic information here might be highly misleading; they might not
in fact interact with any, but they could be open on the screen all day.
I don't think what you want to do can be done in a meaningful way--not with splunk anyway.
I completely agree and that's what I told the group in the first place.
However, they are keen to at least have some stats that can look shiny... no matter how pointless they truly are.