Splunk Search

Regex for URL parsing

ChhayaV
Communicator

Hi,

I want to extract url's from the events as a seperate field.

Here is the log file

04/15/2013 17:51:58.09  w3wp.exe (0x113C)                           0x3D50  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/GEOMETRIC/SitePages/MyEnrollment.aspx))
04/15/2013 17:51:58.26  w3wp.exe (0x113C)                           0x4AA0  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/PublicSite/images/header.jpg)) 
04/15/2013 17:59:25.20  w3wp.exe (0x113C)                           0x14B0  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/_LAYOUTS/ClientPortal/SilverlightWebParts/PROD/MyBenefits.xap?ver=5.19))

Here i just want to extract the url's ends with .aspx and .xap pages like
https://www.abc.co.in:443/GEOMETRIC/SitePages/MyEnrollment.aspx
https://www.abc.co.in:443/_LAYOUTS/ClientPortal/SilverlightWebParts/PROD/MyBenefits.xap?ver=5.19

If i write regex as (?i)\(GET:(?P< FIELDNAME>[^\?]+) ,the url is not being extracted properly.

Please help with the regex.

Tags (1)
0 Karma
1 Solution

MHibbin
Influencer

Not sure your second example is an aspx file, but I'm not web developer. However the following regex will capture those that end in ".aspx"...

"GET:\w+://(?P<url>[^\)]+\.aspx)"

You can try out regular expressions on the following site... handy tool:

http://gskinner.com/RegExr/

Hope this helps.

View solution in original post

ChhayaV
Communicator

hi,
i want to restrict my regex to first match only

Leaving Monitored Scope (Request (GET:https://www.abc/_layouts/ClientPortal/abc/CustomPages/LoginPage.aspx?ReturnUrl=%2f_layouts%2fAuthent...). Execution Time=17.1800154751023
if this is my log entry then i should get only "LoginPage.aspx" but the result is "LoginPage.aspx?ReturnUrl=%2f_layouts%2fAuthenticate.aspx"

0 Karma

burkmat
Engager

All current answers rely on the HTTP request being a GET-request. HTTP has several types (GET/POST/HEAD being most common), and if you want all URLs to be captured, you need to take this into consideration.

The following regex would probably be a better choice to catch all HTTP methods, and all URLs regardless of weird formats (assuming no GET-parameters are appended to the URL - if so you need to take them into consideration).

(?i)\(Request \([A-Z]+:(?<fieldname>.*\.(aspx|xap))\)\)$

Ayn
Legend

The regex should cover that. It does not cover parameters though, like burkmat said.

0 Karma

ChhayaV
Communicator

Hi,
Its working But how can i extract word.aspx and word.word.word.xap or word.xap all other possible combinations of word and (.)

0 Karma

MHibbin
Influencer

Not sure your second example is an aspx file, but I'm not web developer. However the following regex will capture those that end in ".aspx"...

"GET:\w+://(?P<url>[^\)]+\.aspx)"

You can try out regular expressions on the following site... handy tool:

http://gskinner.com/RegExr/

Hope this helps.

ChhayaV
Communicator

Hi,
Its working But how can i extract word.aspx and word.word.word.xap or word.xap all other possible combinations of word and (.)

0 Karma

kristian_kolb
Ultra Champion

should work;

rex "\(GET:(?<fieldname>[^\)]+\.(xap|aspx))"

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...