I'm trying to define a custom set of fields for a sourcetype and am finding that the "train" command is a) tedious b) doesn't work. Here's the basic format of my apache log:
LogFormat "%h %l %u %t %P \"%r\" %>s %X %b %I %O %D \"%{Referer}i\" \"%{User-Agent}i\" \"%{Host}i\" \"%{X-Forwarded-For}i\" \"%{X-Cluster-Client-IP}i\" \"%{True-Client-IP}i\" \"%{Via}i\" \"%{Akamai-Origin-Hop}i\""
I just want a way to create a definition from this that extracts these fields and am not finding a good way to do this. Am I missing something?
You can specify singularly formatted extractions within props.conf directly, or if you need multiple formats you can use a combination of props and transforms. There are default extractions built into Splunk for apache access and error logs. These can be referenced in $SPLUNK_HOME/etc/system/default/transforms.conf under the "access-extractions" stanza:
[access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?<referer>[[bc_domain:referer_]]?+[^"]*+)"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]
If you need to make a custom format, you could use the above as a template for a new extraction for your particular custom log file.
You can add a stanza on your props.conf and transforms.conf:
props.conf
[customsourcetype] TRANSFORMS-logformat = customlogformat
transforms.conf
[customlogformat] REGEX = ****insert regex here**** FORMAT = field1::$1 field2::$2 field3::$3