Splunk Search

How to write the regex to extract the domains from URLs?

ccsfdave
Builder

I have been through the field extractor, answers.splunk.com, and the interwebs looking for help on this one. So our Palo Alto will give us the URLs of sites visited - here is a sample:

crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl
safebrowsing-cache.google.com/
p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/
de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html
a248.e.akamai.net/

I would like to be able to extract the domains e.g.

microsoft or microsoft.com
google or google.com
gstatic or gstatic.com
tynt or tynt.com
akamai or akamai.net

I would think that the way to go about it is to look for the FIRST .com, .net, .org etc and then work back to the previous . to grab the domain but that is beyond me.

Can anyone help?

1 Solution

somesoni2
Revered Legend

Try this run anywhere sample

| gentimes start=-1 | eval URL="crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl safebrowsing-cache.google.com/ p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/ de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html a248.e.akamai.net/" | table _raw  | makemv URL| mvexpand URL| rex field=URL "(?<domain>\w+\.\w+)\/"

View solution in original post

somesoni2
Revered Legend

Try this run anywhere sample

| gentimes start=-1 | eval URL="crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl safebrowsing-cache.google.com/ p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/ de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html a248.e.akamai.net/" | table _raw  | makemv URL| mvexpand URL| rex field=URL "(?<domain>\w+\.\w+)\/"

ccsfdave
Builder

@somesoni2

You have it, but help me understand it so that I may apply it to my search. As @Rhin0Crash stated the Palo Altos see the field as "url" so my base search is: index=pan_logs sourcetype=pan* src_ip=x.x.x.x url=*

0 Karma

Rhin0Crash
Path Finder

@ccsfdave :

index=pan_logs sourcetype=pan* src_ip=x.x.x.x url=* | rex field=URL "(?\w+.\w+)\/" | table domain _raw

0 Karma

ccsfdave
Builder

Yup you got it!

| rex field=url "(?<domain>\w+\.\w+)\/"
0 Karma

Rhin0Crash
Path Finder
 search | rex field=_raw "(?<domain>\w+)\.(com|net|gov|edu|co)"

I think

You can replace the field with what field the PA gives you for URL. That might be URL, or misc, or uri.
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...