Splunk Search

Extracting devices that accessed my website

brownd92
New Member

Hi there,
How do I write a report which can parse a log file and let me know which devices have accessed my website.
Example line from source file:

9/17/2012 8:45:18 AM 12.23.34.45 Mozilla/5.0 (iPhone; CPU iPhone OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko)

I need a report which will say:
iPhone 24%
Blackberry 2%
Windows 15%

I would like to define the devices like in the search field:
source="/Users/me/extendedlog.txt" iphone

Thanks in advance

Tags (1)
0 Karma

lguinn2
Legend

There is an app that provides a dynamic lookup for user agent strings; it is called TA-uas_parser. Download it from

http://apps.splunk.com/app/1007

It's free. It should help you parse out the devices.

0 Karma

kristian_kolb
Ultra Champion

Ok, so first you need to extract the fields; you can try this in the search field as a rex statement before committing it to config files.

 ... | rex "^(?:[\S]* ){4}(?<ua>.*)\s\w+$" 

That should give you the various user-agents in a field called ua. Then comes the tricky part - trying to match a particular (set of) user-agent(s) to a 'device'. The below example is one way to do this, there may be other, simpler ways - but the nature of user-agents is that they can look almost like anything. You'll have to fill out strings that will match your needs, as this just matches strings for 'MSIE 7.0', 'MSIE 8.0' and 'Safari'.

... | eval device = case(ua LIKE "%MSIE 7.0%", "IE7", ua LIKE "%MSIE 8.0%","IE8", ua LIKE "%Safari%","Apple") 

Then you can do stuff like:

 ... | top 10 device

or

... | stats c by device

Hope this helps,

Kristian

0 Karma

brownd92
New Member

Thanks Ill try that and let you know 🙂

0 Karma

kristian_kolb
Ultra Champion

edit; typo + some extra info.

0 Karma

brownd92
New Member

Hi there,
SIMAPP could be another word, but just a word not a string with spaces.

Thanks

0 Karma

kristian_kolb
Ultra Champion

So it's just the timestamp, IP, User-agent, string?

And in these cases you want to label this as IE7?

Unfortunately for you, the log seems to be whitespace separated, and the user_agent contains whitespace...

What does the string SIMAPP stand for? Is it always SIMAPP or could it be anything (including strings with spaces)?

/k

0 Karma

brownd92
New Member

9/5/2012 12:43:22 PM 84.241.141.114 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET4.0C; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729); SIMAPP
9/5/2012 12:45:12 PM 84.241.141.114 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET4.0C; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729); SIMAPP

0 Karma

kristian_kolb
Ultra Champion

The problem will be to determine how you want to parse the User_Agent into a 'device' - i.e. something that would make sense.

Given that User-agents differ wildly, there is no definite way to do this.

However, your logs may be 'nicer' and more predictable than the average internet-facing web server. Please provide some more sample events.

/k

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...