Getting Data In

Apache logfile with virtualhost added to logs

phoenixdigital
Builder

Hi All,

There is a set of webservers we are trying to index which have many virtual hosts on them. This is simple enough to add in apache by changing the LogFormat from


LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

to

LogFormat "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined

However this now breaks the magic that splunk used to do for parsing apache logfiles.

So I dug into /opt/splunk/etc/system/default/transforms.conf and found these lines

[access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

and in /opt/splunk/etc/system/default/props.conf found this


[access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [

I can see I just need to add a [[nspaces:vhost]]\s to the transforms.conf entry but obviously dont want to mess with the defaults.

I tried to replicate what I saw in props.conf and transforms.conf into my own app but it just didn't seem to work????

my inputs.conf

[monitor:///etc/httpd/logs/access_log*]
sourcetype = vhost_access_combined
disabled = false
followTail = 0
host = development.server.com
index = webserver

my props.conf

[vhost_access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = vhost-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [

my transforms.conf

[vhost-access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: vhost, clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:vhost]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

Any ideas how to get this working?

I have more complex questions to follow regarding having the host in splunk set to the value of vhost in the log entry but I will do this in baby steps first.

Tags (2)
0 Karma

oscarspaz
Explorer

Universal Forwarder does not execute any parsing.

http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Typesofforwarders

0 Karma

phoenixdigital
Builder

Ok I seemed to get it to work eventually using the following

inputs.conf


[monitor:///etc/httpd/logs/access_log*]
sourcetype = advanced_access_combined
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au

[monitor:///etc/httpd/logs/error_log*]
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au

props.conf


[advanced_access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = advanced-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [

transforms.conf


[all_lazy]
REGEX = .*?

[all]
REGEX = .*

[nspaces]
# matches one or more NON space characters
REGEX = S+

[qstring]
#matches a quoted "string" - extracts an unnamed variable - name MUST be provided as in [[qstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = "(?<>[^"]*+)"

[sbstring]
#matches a string enclosed in [] - extracts an unnamed variable - name MUST be provided as in [[sbstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = [(?<>[^]]*+)]

[bc_domain]
REGEX = (?<domain>w++://[^/s"]++)

[bc_uri]
# backwards compatible uri regex
# uri = path optionally followed by query [/this/path/file.js?query=part&other=var]
# path = root part followed by file [/root/part/file.part]
# Extracts: uri, uri_path, root, file, uri_query, uri_domain (optional if in proxy mode)
REGEX = (?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]](?<file>[^s?/]+)?)(?:?(?<uri_query>[^s]))?)

[reqstr]
REGEX = [^s"]++

[access-request]
# very relaxed regex for extracting fields from the request
REGEX = "s*+[[reqstr:method]]?(?:s++[bc_uri])?s+"

[advanced-access-extractions]
REGEX = ^[[nspaces:vhost]]s++[[nspaces:clientip]]s++[[nspaces:ident]]s++[[nspaces:user]]s++[[sbstring:req_time]]s++[[access-request]]s++[[nspaces:status]]s++[[nspaces:bytes]]s++[nspaces:req_process_time]?[[all:other]]

It seemed I needed to copy alot of extras from the /opt/splunk/etc/system/default/transforms.conf which makes sense.

Another issue I encountered was that I have a primary index server and the apache files are being forwarded using a 'Universal Forwarder'

The whole thing did not work when props.conf and tranforms.conf were on the 'Universal Forwarder'. I needed to add them to the indexing server for the logfiles to be parsed correctly.

This is potentially going to be an issue as I would like to get the virtual host in the logfile to be marked as the Splunk host. The host is currently defined on the 'Universal Forwarder' in inputs.conf however I dont extract the virtual host until it hits the transforms.conf on the indexing server. I think by that time it will be too late to set the Splunk host. Anyway I will create a new question for that as it is out of the scope of this one.

Edit: The formatting rules here are useless when pasting in conf files so they are a bit munted. If someone needs the configs message me (if thats possible with splunkbase)

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Did answers remove the leading slash on your \s++ because it's only showing 's++'?

phoenixdigital
Builder

Here is an example log line

developer.management.theclient.rdev.com 192.168.31.108 - stingray [06/Jul/2011:12:33:21 +1000] "GET /pop.php?m=testimonial/edit&id=1 HTTP/1.1" 200 166 "http://developer.management.theclient.rdev.com/?m=testimonial/details&id=1" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0"

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...