Splunk Search

Extracting fields from an existing Field

psheck117
New Member

I am working on some http_referer analysis from my proxy logs, seems like an interesting thing to do. I want to do an additional search time field extraction and rip apart the http_referer field to provide more search functionality from the data.

Can I do something like:

transforms.conf:
REGEX = field=http_referrer ^(?\w+)://

*Yes, I realize my field name isn't the same as the RFC... haha, official misspelling 😕

I can build the whole thing out with a single line, and I am sure the hardware can handle the overhead without issue (I hope), but I'd rather have field anchor of some sort to go off of.

Am I missing something on this?

After thoughts: I can do a content match on the :// as there is nothing in the logs that should contain that combination of characters in ASCII, any colons in the URI will be in hex or something else.

Thanks.

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I believe you're looking for the SOURCE_KEY setting in transforms.conf, see http://docs.splunk.com/Documentation/Splunk/latest/Admin/transformsconf for details.

As for building a regex to match on "something ending with ://", that will work but not be a pinnacle of efficiency. The automaton working to match the regex will constantly try to start, walk along, and then fail repeatedly - much like running a Splunk search using key=*value. It's much faster to have quick failures by anchoring the start to something.

View solution in original post

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I believe you're looking for the SOURCE_KEY setting in transforms.conf, see http://docs.splunk.com/Documentation/Splunk/latest/Admin/transformsconf for details.

As for building a regex to match on "something ending with ://", that will work but not be a pinnacle of efficiency. The automaton working to match the regex will constantly try to start, walk along, and then fail repeatedly - much like running a Splunk search using key=*value. It's much faster to have quick failures by anchoring the start to something.

0 Karma

psheck117
New Member

Yeah, I realized that after I committed my transform... reading rfc1945 has been enlightening to say the least. Here is a crack at a proper REGEX for scheme, I will comment and add the http_referer_uri_extension after testing.

REGEX = (?[a-zA-Z+.-]+)://(?S[^/]+)((?/.[^?]+))?((??.*))?

Ha! Looking at my regex makes me question if I can tighten it a little better too.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

A rather theoretical comment on that - if you truly want to capture every imaginable URI scheme, using \w+ isn't going to catch them all. There are more or less obscure schemes with dots and dashes in them.

0 Karma

psheck117
New Member

Here is the full regex for my http_referer extraction. If you do something like this you may be surprised with what shows up as a referrer scheme.

REGEX = (?\w+)://(?\S[^/]+)((?/.[^?]+))?((?\?.*))?

I could probably get into the depth of http_referer_uri_extension, but that is hit or miss, and right now I am not sure I need the detail. Though, thinking about it, I could slip it in there.

My first inclination was to break it out into multiple extractions too.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

If you know you're only going to encounter http and https, consider using https? as your regex... it'll at least help someone read it later.

0 Karma

psheck117
New Member

Thanks Martin! I will check out & use SOURCE_KEY, I knew I was missing something.

As for my regex, definitely not going to end on ://. Though, there is only one place in the event that will exist, http:// or https:// in the referrer field, if it exists at all. I didn't want to put my whole regex into the question, so left at the first extracted field.

Thanks again!

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...