Hi there,
How do I write a report which can parse a log file and let me know which devices have accessed my website.
Example line from source file:
9/17/2012 8:45:18 AM 12.23.34.45 Mozilla/5.0 (iPhone; CPU iPhone OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko)
I need a report which will say:
iPhone 24%
Blackberry 2%
Windows 15%
I would like to define the devices like in the search field:
source="/Users/me/extendedlog.txt" iphone
Thanks in advance
There is an app that provides a dynamic lookup for user agent strings; it is called TA-uas_parser. Download it from
http://apps.splunk.com/app/1007
It's free. It should help you parse out the devices.
Ok, so first you need to extract the fields; you can try this in the search field as a rex
statement before committing it to config files.
... | rex "^(?:[\S]* ){4}(?<ua>.*)\s\w+$"
That should give you the various user-agents in a field called ua
. Then comes the tricky part - trying to match a particular (set of) user-agent(s) to a 'device'. The below example is one way to do this, there may be other, simpler ways - but the nature of user-agents is that they can look almost like anything. You'll have to fill out strings that will match your needs, as this just matches strings for 'MSIE 7.0', 'MSIE 8.0' and 'Safari'.
... | eval device = case(ua LIKE "%MSIE 7.0%", "IE7", ua LIKE "%MSIE 8.0%","IE8", ua LIKE "%Safari%","Apple")
Then you can do stuff like:
... | top 10 device
or
... | stats c by device
Hope this helps,
Kristian
Thanks Ill try that and let you know 🙂
edit; typo + some extra info.
Hi there,
SIMAPP could be another word, but just a word not a string with spaces.
Thanks
So it's just the timestamp, IP, User-agent, string?
And in these cases you want to label this as IE7?
Unfortunately for you, the log seems to be whitespace separated, and the user_agent contains whitespace...
What does the string SIMAPP stand for? Is it always SIMAPP or could it be anything (including strings with spaces)?
/k
9/5/2012 12:43:22 PM 84.241.141.114 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET4.0C; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729); SIMAPP
9/5/2012 12:45:12 PM 84.241.141.114 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET4.0C; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729); SIMAPP
The problem will be to determine how you want to parse the User_Agent into a 'device' - i.e. something that would make sense.
Given that User-agents differ wildly, there is no definite way to do this.
However, your logs may be 'nicer' and more predictable than the average internet-facing web server. Please provide some more sample events.
/k