Splunk Search

Why is my props.conf and transforms.conf configuration not extracting fields from access_combined logs with a vhost?

lukas_loder
Communicator

Hi

I have a Problem with my Access_combined which has a vhost at the beginning like this:

www.domain.com:80 10.60.50.40 - - [04/Nov/2015:11:14:26 +0100] "GET /path/to/file/custom/flexslider.css HTTP/1.1" 200 1663 "http://www.domain.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

When I index it, it doesn't get the fields from Access_combined.
I already tried to create a new transforms.conf and props.conf.

I'm indexing those logs with sourcetype=webserver_access_combined

Props.conf

[webserver_access_combined]
pulldown_type = true 
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = vhost-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = \[
category = Web
description = National Center for Supercomputing Applications (NCSA) combined format HTTP web server logs (can be generated by apache or other web servers)

Transforms.conf

[vhost-access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)  
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer" 
REGEX = ^[[nspaces:vhost]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?<referer>[[bc_domain:referer_]]?+[^"]*+)"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]

I have those configurations on my indexer Servers. And I also see the logs with the correct sourcetype, but it doesn't work.

Does somebody have an idea why it doesn't work?

Thanks!

0 Karma

woodcock
Esteemed Legend

Your REGEX is crazy; try this one:

REGEX=^(?<vhost>\S+)\s+(?<clientip>\S+)\s++(?<ident>\S+)\s+(?<user>\S+)\s+\[(?<req_time>[^\]]+)\]\s+"(?<access_request>[^"]+)"\s+(?<status>\S+)\s+(?<bytes>\S+)\s+"(?<referrer>[^"]+)"\s+"(?<user_agent>[^"]+)"
0 Karma

hagjos43
Contributor

Did you build out your extractions and confirm them in something like regex101? I copied your example log and your extractions and it did not match. I started a bit and for the first few fields it would look more like this: \n(?\S+):(?\d+)\s(?\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s

Also you'll want your extractions to take place at search-time in your props.conf like this:

EXTRACT-blah = \n(?<vhost>\S+):(?<clientport>\d+)\s(?<clientip>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s
0 Karma

lukas_loder
Communicator

I just used the the original which was in the transforms.conf like this:

REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?<referer>[[bc_domain:referer_]]?+[^"]*+)"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]

and tried to change this one... so this isn't the correct way?

0 Karma

hagjos43
Contributor

based on what I"m seeing that won't work. to see if your regex works do something like this:

Your Search | rex "^(?<vhost>\S+)\s+(?<clientip>\S+)\s++(?<ident>\S+)\s+(?<user>\S+)\s+\[(?<req_time>[^\]]+)\]\s+"(?<access_request>[^"]+)"\s+(?<status>\S+)\s+(?<bytes>\S+)\s+"(?<referrer>[^"]+)"\s+"(?<user_agent>[^"]+)""
0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...