Can you please help us, how to extract the sourcetype (like access_log format with all fields) from the below pattern of logs and these logs are coming as syslog format using syslog-ng forwarder, currently its not extracting any field like access_log automatically after indexing and assigning by default as syslog sourcetype. we are trying create new sourcetype and assign to populate the fields automatically.
Sample Log :
varnishncsa bal-1234 1.48.1.2 - - [22/Aug/2014:15:04:45 +0000] "GET http://www.test.com/error HTTP/1.1" 404 30041 "http://www.test.com/test" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.78.2 (KHTML, like Gecko) Version/7.0.6 Safari/537.78.2" 0.090000868 miss pass request_id="v-a732c002-2a0d-11e4-88b7-12313d2d8c3b" "-"
varnishncsa bal-1234 1.2.7.8 - - [22/Aug/2014:15:04:45 +0000] "GET http://www.test.com/?page=2 HTTP/1.1" 302 26 "-" "Mozilla/5.0 (compatible; MSIE 8.0; WOW64; Windows NT 5.1; Trident/4.0)" 0.420004606 miss miss request_id="v-a6fd5804-2a0d-11e4-838f-12313d2d8c3b" "-"
varnishncsa bal-1234 6.7.5.22 - - [22/Aug/2014:15:04:45 +0000] "GET http://www.test.com/ HTTP/1.1" 200 38087 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6" 0.000000000 hit hit request_id="v-8225c840-2a0d-11e4-8d96-12313d2d8c3b" "-"
varnishncsa bal-1234 5.2.17.9 - - [22/Aug/2014:15:04:44 +0000] "GET http://www.test.com/ HTTP/1.1" 302 26 "-" "Mozilla/5.0 (compatible; MSIE 8.0; WOW64; Windows NT 5.1; Trident/4.0)" 1.140012741 miss miss request_id="v-a6758a00-2a0d-11e4-945a-12313d2d8c3b" "-"
varnishncsa bal-1234 3.2.71.1 - - [22/Aug/2014:15:04:43 +0000] "GET http://www.test.com/5800722 HTTP/1.1" 404 30221 "http://www.test.com/" "Akamai-SiteSnapshot/6.9.0.2" 1.720019102 miss miss request_id="v-a616eefa-2a0d-11e4-87a4-12313d2d8c3b" "-"
Current config (with dynamic index, we need to form new sourcetype and assign it within the config to populate the fields automatically):
inputs.conf
[tcp-ssl:5140]
sourcetype = syslog
props.conf
[syslog]
TRANSFORMS-idx_routing = generic_idx_routing
transforms.conf
[generic_idx_routing]
REGEX = ^(\S+)\s*
FORMAT = $1
DEST_KEY = _MetaData:Index
if i force to use existing sourcetype by uploading the data also, its not extracting fields. because in the prefix of each entry added similar "varnishncsa bal-1234 " values (please refer the above entry). Please provide the matching REGEX for new entry in transforms.com.
props.conf :
[acquia_varnish_log]
NO_BINARY_CHECK = 1
REPORT-access = access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = \[
pulldown_type = 1
transforms.conf :
[access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?<referer>[[bc_domain:referer_]]?+[^"]*+)"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]
We have created the REGEX from like below and its working fine.
transforms.conf :
[acquia-access-extractions]
REGEX = ^[[nspaces:logfilename]]\s++[[nspaces:nodename]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]
[generic_idx_routing]
REGEX = ^varnishncsa
FORMAT = varnishncsa
DEST_KEY = _MetaData:Index
[generic_sourcetype_routing]
REGEX = .
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::acquia_access_combined
props.conf
[syslog]
TRANSFORMS-idx_routing = generic_idx_routing
TRANSFORMS-sourcetype_routing = generic_sourcetype_routing
We have created the REGEX from like below and its working fine.
transforms.conf :
[acquia-access-extractions]
REGEX = ^[[nspaces:logfilename]]\s++[[nspaces:nodename]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]
[generic_idx_routing]
REGEX = ^varnishncsa
FORMAT = varnishncsa
DEST_KEY = _MetaData:Index
[generic_sourcetype_routing]
REGEX = .
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::acquia_access_combined
props.conf
[syslog]
TRANSFORMS-idx_routing = generic_idx_routing
TRANSFORMS-sourcetype_routing = generic_sourcetype_routing
See the props.conf and transforms.conf file in $SPLUNK_HOME\etc\system\default directory. They contain sourcetype definition and necessary field extraction for access_log type files. May be re-using those setting in your new sourcetype is what you need.