We have a custom regex in transforms.conf and props that extracts the correct hostname from the source nginx logs, however this does not work with the other sourcetypes. Fro example automatic, I tried adding this entry to the transforms, and props files see below, However that is not working correctly for the automatic source.
Example
The actual host name is as1.br2.la.wiredrive.com , however the host is being reported as
host=as3.br2.la.wiredrive.com
Mar 22 15:19:40 as1.br2.la.wiredrive.com appfuel[97326]: package="web" env="production" userId="101603" clientCode="jpmktg" guid="WD-KXMPD" view="update-project-access-log" uid="rBAWhVFMmxck7sxdBL4vAg==" URI="/?routekey=update-project-access-log.json" method="post" scope="private"
host=as3.br2.la.wiredrive.com Options| sourcetype=automatic Options| source=/var/log/appfuel.log
transforms.conf
[setnull]
REGEX = \.(mp4|jpg|bz2|png|gif|js|swf|jar|signed|flv|json)
DEST_KEY = queue
FORMAT = nullQueue
[nginx_host]
REGEX = [\d]{2}:[\d]{2}:[\d]{2} (?P<hostname>[^\s]+)\s+nginx:
FORMAT = host::$1
DEST_KEY = MetaData:Host
[appfuel_host]
REGEX = [\d]{2}:[\d]{2}:[\d]{2} (?P<hostname>[^\s]+)\s+automatic:
FORMAT = host::$1
DEST_KEY = MetaData:Host
props.conf
[Nginx]
NO_BINARY_CHECK = 1
pulldown_type = 1
TRANSFORMS-null = setnull, nginx_host, automatic
EXTRACT-HTTPstatus = [^&\n]*&\w+=\w+\s+(?P<HTTPstatus>\w+/\d+\.\d+"\s+\d+)
EXTRACT-UpstreamTime = (?:[^\-\n]*\-){4}"\s+\w+_\w+="\d+\.\d+"\s+(?P<UpstreamTim
e>[^ ]+)
EXTRACT-RequestTime = (?:[^\-\n]*\-){4}"\s+(?P<RequestTime>[^ ]+)
EXTRACT-BytesSent = (?:[^/\n]*/){6}\d+\.\d+"\s+\d+\s+(?P<BytesSent>[^ ]+)
EXTRACT-StatusOnly = (?:[^"\n]*"){2}\s+(?P<StatusOnly>[^ ]+)
EXTRACT-FIELDNAME = (?i)^(?:[^ ]* ){3}(?P<FIELDNAME>[^ ]+)
[source::/var/log/appfuel.log]
EXTRACT-AppHostname = (?:[^ \n]* ){3}(?P<AppHostname>[^ ]+)
EXTRACT-FIELDNAME = (?i)^(?:[^ ]* ){3}(?P<FIELDNAME>[^ ]+)
Any help is appreciated in advance.
Thank you
Hmm, there seems to be a few things that are wrong.
1) the TRANSFORMS
call in props.conf
will look for an [automatic]
stanza in transforms.conf
, but there is none. But there is one called [appfuel_host]
.
2) I don't know if it's a good idea to call a sourcetype 'automatic
', since that word may be reserved in that context, i.e. tell Splunk to figure out the sourcetype as best it can.
3) FIELDNAME
is a placeholder name, usually created by the Interactive Field Extractor. It should not be used in config files. Copy/paste?
4) there is no specific sourcetype
stanza in props.conf
relating to the events you want extract stuff from. Usually that is better than working with [source::/blah/log.log]
5) the CHECK_BINARY
config setting will only be honoured in the inputs-phase, which happens on the same instance as the files are being read off disk. Usually that will be a forwarder, but of course some files will be read locally by the indexer. This is not related to your other problems.
6) if this data is coming from a forwarder, check the inputs.conf
and server.conf
files on the forwarder to see if the wrong hostname is explicitly set there. Has been known to happen when server images with an installed forwarder are being cloned.
Hope this helps,
Kristian