Splunk Search

What is wrong with this regular expression to extract the URL from our logs?

harrisoncs
Explorer

I am attempting to extract the URL from our webfilter logs. The automatic field extraction process did not work. I now have a partially working expression and can't seem to find the reason it's not working. See below:

(?(https|http|ftp)://[a-zA-Z0-9.\-_]+/[a-zA-Z0-9+&@#/%=~_\-|!:,.;]*)

This command is only returning a couple of http URLs. It is not getting any https even though preview shows plenty of possibilities. Is there something simple I'm missing? One iteration only had https in the expression, however, it returned no results. The sample data below as it stands now, would not return results, as it is https.

Sample data (IPs have been changed)

"May 12 15:30:26 10.10.10.10 May 12 19:30:21 Sourcefire3D WFAccessURL: Protocol: TCP, SrcIP: 20.20.20.20, OriginalClientIP: ::, DstIP: 30.30.30.93, SrcPort: 64776, DstPort: 443, TCPFlags: 0x0, IngressInterface: Cisco, EgressInterface: outside, DE: Primary Detection Engine (dc1c2f78-185f-11e6-a6f7-dabf06bba1d5), Policy: SFR-Policy, ConnectType: Start, AccessControlRuleName: Unknown, AccessControlRuleAction: Allow, Prefilter Policy: Unknown, UserName: No Authentication Required, Client: SSL client, ApplicationProtocol: HTTPS, InitiatorPackets: 3, ResponderPackets: 1, InitiatorBytes: 715, ResponderBytes: 66, NAPPolicy: Balanced Security and Connectivity, DNSResponseType: No Error, Sinkhole: Unknown, URLCategory: Uncategorized, URLReputation: Risk unknown, URL: https://www.splunk.com";
0 Karma
1 Solution

woodcock
Esteemed Legend

Why are you complicating it so much? Why not something like this:

(?:https|http|ftp)?:\/\/(?<URL>\S+)

View solution in original post

harrisoncs
Explorer

I wanted to accept all of the answers, I accepted the one I used to accomplish my goal. Appreciate everyone's input. I started with regex101 last week and indent to use it to get me further along.

0 Karma

woodcock
Esteemed Legend

You can upvote any answer or comment (and should, if they helped or educated you at all).

0 Karma

ddrillic
Ultra Champion

You can start easy -

This one matches - (https|http|ftp):\/\/www.splunk.com

and then -

(https|http|ftp):\/\/([a-zA-Z0-9\.]*)

This util is just sensational - regex101
It shows -

alt text

woodcock
Esteemed Legend

Why are you complicating it so much? Why not something like this:

(?:https|http|ftp)?:\/\/(?<URL>\S+)

grimlock
Path Finder

You need to escape special characters like slash and period.

Please reference the following link for special character list.
http://regular-expressions.mobi/characters.html?wlr=1

Hope that helps.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...