Splunk Search

How to parse proxy information to create a list of rootlevel domains and tld?

packet_hunter
Contributor

Scenario:

I am trying to create a list of all the unique domains (from web requests) from the proxy.

Currently I am using:

index=websurf | stats values(dest)

websurf is the index of the proxy logs
dest is the destination IP or domain the user is going to.

The goal is to extract the rootdomain.tld from the results.
I also need a way to separate the IPs (numeric only) from the domains (alphanumeric), and then further parse the alpha numeric subdomain.rootdomain.tld to rootdomain.tld. I am having no luck writing a REX for this, that starts at the tld and grabs the rootdomain.

some results examples are:

09458kf84ks8.subdom.rootdomain.tld
www.yahoo.com
1.bad.blogspot.ru
54.239.127.240
65.media.tumblr.com 
a.abc.com

Ideally I need two searches: one to extract all the IP(s) from the results and another search to extract all the rootdomain.tld(s).

Thank you

0 Karma
1 Solution

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

View solution in original post

0 Karma

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

0 Karma

packet_hunter
Contributor

Any REX KungFu Master... little help

I cannot get this rex to work, it works in the "online regex tester" https://regex101.com/
sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /
I have tried the following but it keeps erroring out
index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

Any help is much appreciated

0 Karma

packet_hunter
Contributor

never mind it is working now, must have been a glitch

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...