Splunk Search

How to create a regex to match URLs ending with a known file extension downloads?

jkumarr2
New Member

I am trying to filter out all URLs which are for file downloads and those URLs will end with the file extension. Eg - a zip, doc, xls,docx, xlsx, py file downloaded from the internet

I tried with regex like --> splunk search query...| regex url="//.+?/.+?.(zip|doc|docx|xls|xlsx)$" This regex works and picks up quite a few urls that are ending with the file extensions mentioned in teh regex, can someone provide me with abetter regex or confirm if what i have above is good enough

I have pasted a few Sample values for url field below, but there are a lot of other possible combinations as u can imagine there are so many possible URL combinations on the internet.

http://www.liverpoolfc.com
http://www.blackberry.com
http://www.lflogistics.com/sites/default/files/news/lflstc.pdf
https://www.abc.com/tiny/7uwi2
https://download.abc.com/download/ep/FE-90CRC000-28.zip
http://www3.abce.hk/listedco/listconews/SEHK/2019/0521/LTN20190521894.pdf
https://www.abc.com/review/www.xyz-center.com
https://xyz.abc.com/abc-voyager.php
http://wealthbriefing.com/forms/view.php?id=1456762&element_34=saint.xyz@gmail.com

0 Karma
1 Solution

ips_mandar
Builder

Hi
This would be better way to regex

|rex field=url ".*(zip|doc|docx|xls|xlsx|pdf)$"

Assuming this url is stored in url fieldname

View solution in original post

0 Karma

ips_mandar
Builder

Hi
This would be better way to regex

|rex field=url ".*(zip|doc|docx|xls|xlsx|pdf)$"

Assuming this url is stored in url fieldname

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...