Splunk Search

Extracting the User-Agent HTTP header from an Apache log

Armyeric
Path Finder

Looking at all the posts regarding User-Agent HTTP header searches, one of the commonalities is that they were told to change their format to Combined Log Format. I unfortunately cannot do that but I am still being asked to create a dashboard reports to show most common OS used and most common browser. Here is a log:

XX.XX.XX.XX - - [30/Jul/2013:15:16:40 -0700] 0 "GET /portal-web/images/denied.png HTTP/1.1" 200 882 "htps://ABC.ABC.com/portal-web/stuff/stuff.action" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.0)"

Ultimately I want separate count columns for browser type and OS type. How do I go about extracting the info I want? I believe I need to use a Regex statement, but I am unsure on how to proceed especially since both the client and browser are going to change in size?

Tags (1)
0 Karma

grijhwani
Motivator

A pure regex is not going to do it alone. If you are a novice you can get some help for yourself by using the interactive field extraction creator. It is one of the options in the per-record drop down.

alt text

The difficulty is that there is no defined order or format for sub fields of the UA. I just tried myself with the following sample list culled from recent access logs for the generator to weave its magic on:

Windows NT 5.1
Linux x86_64
Windows NT 6.0
Android 4.1.2
Windows Phone OS 7.5
Windows NT 6.1

The resulting sample extractions it offered were:

Linux x86_64
Windows NT 5.1
+http://yandex.com/bots)" RU
Windows NT 5.1)" US
http://www.majestic12.co.uk/bot.php?+)"; US
rv:17.0) Gecko/20130626 Firefox/17.0 Iceweasel/17.0.7" FR
+http://www.exabot.com/go/robot)" FR
Windows NT 6.2
Mail.RU_Bot/2.0
Windows NT 6.0)" JP
Windows NT 6.1
Windows NT 6.0)" CN
+http://www.google.com/bot.html)" US
Android 4.1.2
+http://www.bing.com/bingbot.htm)" US
+http://www.baidu.com/search/spider.html)" CN
Windows Phone OS 7.5

Even after some manual refinement it continues to miss the mark more than hit it.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Correct. There is no way to do this just by parsing. UA strings are not strongly-specified, they are mostly suggestive. If you need great accuracy, you must use a lookup that maps known patterns to the item you want. (I mean, technically, you can probably write a regex that includes all the logic of a lookup table, but it would be an impractically enormous regex, so let's just say you can't.)

jstockamp
Communicator

You need to either build a lookup table or use a custom command to parse the user agent string. Looks like this might do the trick:

http://splunk-base.splunk.com/apps/48017/ta-uas_parser

gkanapathy
Splunk Employee
Splunk Employee

If you want to job done right, you pretty much need an application. There is no simple way to parse a UA string. It requires either a massive lookup, or a combination of complex logic and a slightly-less-massive lookup. If you have a limited number of UA strings, your best bet is to simply enumerate them all into your own lookup, then set any others to "other" or something.

0 Karma

Armyeric
Path Finder

I would love to use an app, but our Admin doesn't want to use any apps...so I am stuck.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...