I am looking for a way to extract filenames of executable files from a URL in proxy logs. The url field in my logs contain the full URL. Here are a few examples. I think we just need to capture everything past the last "/" if it contains 3 or 4 chars after the last ".". Has anyone done anything like this?
url=http://www.kaco.net/download/kacotv.exe
url=http://acroipm2.adobe.com/15/rdr/ENU/win/nooem/none/consumer/message.zip
url=https://prod308-client.redplum.com/protocol/install/P@H_prod308-1dF7CZ5x.exe
url=http://download.microsoft.com/download/5/3/D/53D3880B-25F8-4714-A4AC-E463A492F96E/41212.00/Silverlight_x64.exe
url=http://download.flv.com/kits/flvd/flvdownloader_setup.exe
Try this run anywhere sample. This is the regex that I use for any field extraction related to URL to extract other information as well
| gentimes start=-1 | eval url="http://www.kaco.net/download/kacotv.exe" | rex field=url "(?P<requestedUrl>(?P<path>\/(((?P<contextRoot>[^\/]+))(\S+\/)*(?P<filename>[^\/\?;=\s]+)([^\s]*))))"
Replace "| gentimes...| eval ulr..." portion with your base search.
Like this:
... | rex field=url "^.*\/(?<programname>[^\.\/]+\.(?:[^\.\/]){3,4})$"
Hi woodcock,
Thank you for you speedy reply. I tried copying and pasting your solution into splunk and it doesnt return any results.
...| rex field=url ".^.*\/(?<filename>[^\.\/]+\.(?:[^\.\/]){3,4})$" | top filename
Any ideas on what I could be missing?
There was an extra period (".") at the start of the RegEx. I have fixed it; try again.