Splunk Search

How to create a regex to match URLs ending with a known file extension downloads?

jkumarr2
New Member

I am trying to filter out all URLs which are for file downloads and those URLs will end with the file extension. Eg - a zip, doc, xls,docx, xlsx, py file downloaded from the internet

I tried with regex like --> splunk search query...| regex url="//.+?/.+?.(zip|doc|docx|xls|xlsx)$" This regex works and picks up quite a few urls that are ending with the file extensions mentioned in teh regex, can someone provide me with abetter regex or confirm if what i have above is good enough

I have pasted a few Sample values for url field below, but there are a lot of other possible combinations as u can imagine there are so many possible URL combinations on the internet.

http://www.liverpoolfc.com
http://www.blackberry.com
http://www.lflogistics.com/sites/default/files/news/lflstc.pdf
https://www.abc.com/tiny/7uwi2
https://download.abc.com/download/ep/FE-90CRC000-28.zip
http://www3.abce.hk/listedco/listconews/SEHK/2019/0521/LTN20190521894.pdf
https://www.abc.com/review/www.xyz-center.com
https://xyz.abc.com/abc-voyager.php
http://wealthbriefing.com/forms/view.php?id=1456762&element_34=saint.xyz@gmail.com

0 Karma
1 Solution

ips_mandar
Builder

Hi
This would be better way to regex

|rex field=url ".*(zip|doc|docx|xls|xlsx|pdf)$"

Assuming this url is stored in url fieldname

View solution in original post

0 Karma

ips_mandar
Builder

Hi
This would be better way to regex

|rex field=url ".*(zip|doc|docx|xls|xlsx|pdf)$"

Assuming this url is stored in url fieldname

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...