Refine your search:

Looking at all the posts regarding User-Agent HTTP header searches, one of the commonalities is that they were told to change their format to Combined Log Format. I unfortunately cannot do that but I am still being asked to create a dashboard reports to show most common OS used and most common browser. Here is a log:

XX.XX.XX.XX - - [30/Jul/2013:15:16:40 -0700] 0 "GET /portal-web/images/denied.png HTTP/1.1" 200 882 "htps://ABC.ABC.com/portal-web/stuff/stuff.action" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.0)"

Ultimately I want separate count columns for browser type and OS type. How do I go about extracting the info I want? I believe I need to use a Regex statement, but I am unsure on how to proceed especially since both the client and browser are going to change in size?

asked 30 Jul '13, 15:30

Armyeric's gravatar image

Armyeric
48125
accept rate: 100%


2 Answers:

You need to either build a lookup table or use a custom command to parse the user agent string. Looks like this might do the trick:

TA-uas_parser on Splunkbase

link

answered 30 Jul '13, 15:36

jstockamp's gravatar image

jstockamp
2646513
accept rate: 27%

I would love to use an app, but our Admin doesn't want to use any apps...so I am stuck.

(30 Jul '13, 15:40) Armyeric

If you want to job done right, you pretty much need an application. There is no simple way to parse a UA string. It requires either a massive lookup, or a combination of complex logic and a slightly-less-massive lookup. If you have a limited number of UA strings, your best bet is to simply enumerate them all into your own lookup, then set any others to "other" or something.

(30 Jul '13, 19:10) gkanapathy ♦

A pure regex is not going to do it alone. If you are a novice you can get some help for yourself by using the interactive field extraction creator. It is one of the options in the per-record drop down.

alt text

The difficulty is that there is no defined order or format for sub fields of the UA. I just tried myself with the following sample list culled from recent access logs for the generator to weave its magic on:

Windows NT 5.1
Linux x86_64
Windows NT 6.0
Android 4.1.2
Windows Phone OS 7.5
Windows NT 6.1

The resulting sample extractions it offered were:

Linux x86_64
Windows NT 5.1
+http://yandex.com/bots)" RU
Windows NT 5.1)" US
http://www.majestic12.co.uk/bot.php?+)" US
rv:17.0) Gecko/20130626 Firefox/17.0 Iceweasel/17.0.7" FR
+http://www.exabot.com/go/robot)" FR
Windows NT 6.2
Mail.RU_Bot/2.0
Windows NT 6.0)" JP
Windows NT 6.1
Windows NT 6.0)" CN
+http://www.google.com/bot.html)" US
Android 4.1.2
+http://www.bing.com/bingbot.htm)" US
+http://www.baidu.com/search/spider.html)" CN
Windows Phone OS 7.5

Even after some manual refinement it continues to miss the mark more than hit it.

link

answered 30 Jul '13, 17:25

grijhwani's gravatar image

grijhwani ♦
1.8k39
accept rate: 22%

1

Correct. There is no way to do this just by parsing. UA strings are not strongly-specified, they are mostly suggestive. If you need great accuracy, you must use a lookup that maps known patterns to the item you want. (I mean, technically, you can probably write a regex that includes all the logic of a lookup table, but it would be an impractically enormous regex, so let's just say you can't.)

(30 Jul '13, 19:08) gkanapathy ♦
Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×855

Asked: 30 Jul '13, 15:30

Seen: 2,593 times

Last updated: 30 Jul '13, 19:10

Copyright © 2005-2014 Splunk Inc. All rights reserved.