Splunk Search

Need rex help with URL

fdevera
Path Finder

Hi I have this rex I'm trying to filter on for any URL that points to file extensions that have two or more extensions. So far I have this:

^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/|hxxp:\/\/|hxxps:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$

Any help is appreciated. Thanks!

0 Karma
1 Solution

Sukisen1981
Champion

hmm still not sure but i will give this a try

  | makeresults 
    | eval url="hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png.exe"
    | rex field=url ".*\/(?<ext>.*)" 
    |eval ext=split(ext,".")
    | eval ext_count=mvcount(ext)

Now, what this does is extract everything after the last /. you make this a mvfield and count the number of extensions.
This will give you the count, in the example above this gives a count of 3 , for footer,break and png.
so you know that anything that has a count greater than 1 has at least 2 dots , something like xx.yyy......
Thats the easy part.Now how you want to to match against all extensions is a bit tricky, you can compare a against some common extensions in the rex or using a like function. But I will wait to first hear from you on whether this works for you
for your use and assuming the field is named url you just need to copy and re-use code from the rex onwards

View solution in original post

Sukisen1981
Champion

hmm still not sure but i will give this a try

  | makeresults 
    | eval url="hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png.exe"
    | rex field=url ".*\/(?<ext>.*)" 
    |eval ext=split(ext,".")
    | eval ext_count=mvcount(ext)

Now, what this does is extract everything after the last /. you make this a mvfield and count the number of extensions.
This will give you the count, in the example above this gives a count of 3 , for footer,break and png.
so you know that anything that has a count greater than 1 has at least 2 dots , something like xx.yyy......
Thats the easy part.Now how you want to to match against all extensions is a bit tricky, you can compare a against some common extensions in the rex or using a like function. But I will wait to first hear from you on whether this works for you
for your use and assuming the field is named url you just need to copy and re-use code from the rex onwards

fdevera
Path Finder
 | rex field=url ".*\/(?<ext>.*)" 
 | eval ext=split(ext,".")
 | eval ext_count=mvcount(ext)

This works great! So the split tells you how many sections are separated by dots. How do I only display ext_count of 3 or higher? How about 3 exactly?

Thanks!

0 Karma

fdevera
Path Finder
 | rex field=url ".*\/(?<ext>.*)" 
 | eval ext=split(ext,".")
 | eval ext_count=mvcount(ext)
 | search ext_count>=3
 | dedup ext

Got it! Thanks for all your help!

Sukisen1981
Champion

glad to see you figured it out @fdevera . Sorry I am in IST times and it was too late in the night for me to see your comments,

0 Karma

jacobpevans
Motivator

If you are just aiming to get everything after the last slash, this is the regex:

^.*\/([^\/]+)$

https://regex101.com/r/y0D5rr/1
If you'd like to fine tune it to clarify extensions, you can do something like this:

^.*\/([^\/]+\.(png|pdf|docx|scr|exe))$

https://regex101.com/r/tv2Th5/1

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

fdevera
Path Finder

I added this:
|rex url="^.*\/([^\/]+)$"

And received this error:

Error in 'rex' command: The regex 'url=^.*\/([^\/]+)$' does not extract anything. It should specify at least one named group. Format: (?...).

0 Karma

jacobpevans
Motivator

Apologies, I was just trying to assist with the regex. If that's the error, here's what you need:

 ^.*\/(?<ThisIsWhatIWantMyFieldNamed>([^\/]+))$
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

Sukisen1981
Champion

hi @fdevera
can you share a sample event and what all you want to extract?

fdevera
Path Finder

index=webproxy |table url

example output:
hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png

I only want to display events with url that have more than one extension. I know this will be difficult because of ransom existence of periods which will cause alot of false positives but that's fine. Any ideas to reduce that would be great too.

0 Karma

Sukisen1981
Champion

hi @fdevera
bit confused on the 'estensions', is it 2 here because of footer.break.png containing 2 dots? or how do you calculate the extensions for this url?

0 Karma

fdevera
Path Finder

I'm looking for direct links to files that have two extensions like .docx.scr or .pdf.exe. What would be the best way to do that in rex? I'm ok with false positives in the results.

0 Karma

Sukisen1981
Champion

uh ha so the example you gave above
hxxp://static.zipcloud.com/a/zipcloud//img/footer.break.png
qualifies as it has break.png, right?

0 Karma

fdevera
Path Finder

What am I doing wrong here?

| rex field=url "^.*\/([^\/]+)$" | table urlrisk_gibson src_host src_ip dst_host dst_ip mwg_client_sent sent user_agent url field10 http_message http_method http_response http_version

0 Karma

jacobpevans
Motivator

As they said, we need to see your data and what you expect to see in order to help you.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

fdevera
Path Finder

Correct - no way around that since extensions can have more than 3 letters, sometimes 5 or 6. And filenames commonly have periods in them. At the very least I'd like to limit my results to those that have only two periods in the file name.

0 Karma

somesoni2
Revered Legend

Agree. For questions like this, sample data is required as just based on regex, we can know what your current regex is doing but can't know if it's doing what you want. Please share which events/values you want to include and which you want to exclude. Please scrub any sensitive data while posting samples.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...