Splunk Search

Extracting the User-Agent HTTP header from an Apache log

Armyeric
Path Finder

Looking at all the posts regarding User-Agent HTTP header searches, one of the commonalities is that they were told to change their format to Combined Log Format. I unfortunately cannot do that but I am still being asked to create a dashboard reports to show most common OS used and most common browser. Here is a log:

XX.XX.XX.XX - - [30/Jul/2013:15:16:40 -0700] 0 "GET /portal-web/images/denied.png HTTP/1.1" 200 882 "htps://ABC.ABC.com/portal-web/stuff/stuff.action" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.0)"

Ultimately I want separate count columns for browser type and OS type. How do I go about extracting the info I want? I believe I need to use a Regex statement, but I am unsure on how to proceed especially since both the client and browser are going to change in size?

Tags (1)
0 Karma

grijhwani
Motivator

A pure regex is not going to do it alone. If you are a novice you can get some help for yourself by using the interactive field extraction creator. It is one of the options in the per-record drop down.

alt text

The difficulty is that there is no defined order or format for sub fields of the UA. I just tried myself with the following sample list culled from recent access logs for the generator to weave its magic on:

Windows NT 5.1
Linux x86_64
Windows NT 6.0
Android 4.1.2
Windows Phone OS 7.5
Windows NT 6.1

The resulting sample extractions it offered were:

Linux x86_64
Windows NT 5.1
+http://yandex.com/bots)" RU
Windows NT 5.1)" US
http://www.majestic12.co.uk/bot.php?+)"; US
rv:17.0) Gecko/20130626 Firefox/17.0 Iceweasel/17.0.7" FR
+http://www.exabot.com/go/robot)" FR
Windows NT 6.2
Mail.RU_Bot/2.0
Windows NT 6.0)" JP
Windows NT 6.1
Windows NT 6.0)" CN
+http://www.google.com/bot.html)" US
Android 4.1.2
+http://www.bing.com/bingbot.htm)" US
+http://www.baidu.com/search/spider.html)" CN
Windows Phone OS 7.5

Even after some manual refinement it continues to miss the mark more than hit it.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Correct. There is no way to do this just by parsing. UA strings are not strongly-specified, they are mostly suggestive. If you need great accuracy, you must use a lookup that maps known patterns to the item you want. (I mean, technically, you can probably write a regex that includes all the logic of a lookup table, but it would be an impractically enormous regex, so let's just say you can't.)

jstockamp
Communicator

You need to either build a lookup table or use a custom command to parse the user agent string. Looks like this might do the trick:

http://splunk-base.splunk.com/apps/48017/ta-uas_parser

gkanapathy
Splunk Employee
Splunk Employee

If you want to job done right, you pretty much need an application. There is no simple way to parse a UA string. It requires either a massive lookup, or a combination of complex logic and a slightly-less-massive lookup. If you have a limited number of UA strings, your best bet is to simply enumerate them all into your own lookup, then set any others to "other" or something.

0 Karma

Armyeric
Path Finder

I would love to use an app, but our Admin doesn't want to use any apps...so I am stuck.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...