Splunk Search

Determining time spent browsing by user from proxy logs

JovanMilosevic
Path Finder

I have some proxy logs in the squid format. Some entries do not have the user, though most do. I can create a transaction by source ip, and that transaction will give me the duration. However, where the logs don't have a user, there will be 2 user Id's - the user, and "-". Below is an example leaving out non relevant fields...

1373936631.040 .. 192.168.0.227 ... GET http://.. bill%20jones NONE/- text/html
1373936633.436 .. 192.168.0.227 ... GET http://.. - NONE/- text/html
1373936651.746 .. 192.168.0.227 ... GET http://.. bill%20jones DIRECT/122.160.120.150 video/x-flv

From this, I would like to output the duration, source Ip, and the user as bill%20jones

20.706 192.168.0.227 bill%20jones

Is there a way to iterate throught the multi value user names, ignoring the user "-", and coming back with the first value that isn't "-" ?

Thanks.

1 Solution

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

View solution in original post

0 Karma

JovanMilosevic
Path Finder

I also found a way that gets over the issue with the above solution, though it has the downside of needing to process the events twice.

First I created a time based lookup...

[UserLookup]
default_match = Unknown
filename = userlookup.csv
max_matches = 1
min_matches = 1
time_field = from
time_format = %Y-%m-%d %H:%M:%S

First pass populates the lookup like so..

sourcetype=squid username!="-" 
| stats min(_time) as from by username clientip 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(from) 
| outputlookup UserLookup

Second pass gives me the results I need...

index=proxy 
| lookup UserLookup clientip as clientip 
| transaction clientip maxpause=1m 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip
0 Karma

JovanMilosevic
Path Finder

I did eventually get there (nearly).

sourcetype=squid 
| transaction clientip maxpause=1m mvlist=t 
| eval username=mvfilter(match(username, "[^\-]")) 
| eval username=mvindex(username,0) 
| eval clientip=mvindex(clientip,0) 
| where username!="" 
| stats count as sessions sum(duration) as duration sum(bytes) as bytes by username clientip

This appears to work so long as there's at least one non "-" user in the transaction. If there isn't, then the transaction gets dropped.

0 Karma

JovanMilosevic
Path Finder

Thanks. This stops the transactions getting dropped, and ensures that any daily stats produced on download volumes are now correct.

0 Karma

dart
Splunk Employee
Splunk Employee

If you change your where username!="" to eval username=coalesce(username, "NONE") does that give you your final desired result?

0 Karma

dart
Splunk Employee
Splunk Employee

Does a search like this do the trick sourcetype=squid | rex field=cs_user "(?<user>[^-].*)" | stats range(_time) as duration first(user) as user by c_ip?

(I'm assuming your existing user field is cs_user and the source ip is c_ip)

JovanMilosevic
Path Finder

Thanks for taking the time to respond. I'm afraid this won't crack it. If someone browses for 2 minutes at 09:00, then does nothing until another 2 minutes at 17:00, this search will have them as browsing all day.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...