Splunk Search

How do you extract the URL into a separate field?

SplunkMasterSne
Explorer

Hello,

I'm trying to extract the URL from the message field, so I can create a separate field called URLs. At the moment, all our logs are in the message field, so extracting various parts is essential.

Can anyone help with the extraction of the URL, for example our log below? I need to extract www.bbc.co.uk:

2019-02-09 23:19:43 "proxyname" 20760 10.10.10.10 e1000000 proxy\proxy-rule-HTTP-Allowed - OBSERVED "News/Media" - 200 TCP_TUNNELED CONNECT - tcp www.bbc.co.uk 443 / - - "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" 15.15.15.15 77608 1760 - "BBC" "none"

Thank you!

0 Karma
1 Solution

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

View solution in original post

jbrocks
Communicator

You could use the field extractor for this or use props.conf to extract the field. For example with regex \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)\s+ so on props.conf:

[my_sourcetype]

EXTRACT-url = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?:[^\s]+)\s+(?<url>[^\s]+)

some thing like that. I think there are better ways to form the regex, was just a short test with regex101.com .. but hope this helps anyway. But I think the best wy, if there is no addon is to create an app, and extract all the values in a stanza of transforms.conf.

SplunkMasterSne
Explorer

Thank you for the help, works perfectly

0 Karma

FrankVl
Ultra Champion

What type of proxy is this and have you checked whether there is an add-on available for this already, that does the field extraction for you?

This looks like a slight variation on the W3C or squid log formats?

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...