The logs are being imported through syslog-ng into one nginx log file on a forwarder.The Challenge is Splunk sees all of the hosts coming from one host. "as3.br0.la.somecompany.com" instead of the individual hosts on the first line of the log, which have similar naming contexts, just different numerical values. What is the best way to extract the correct host fields when indexed, so they appear correctly in the search results? I have tried creating field extractions with this regex,
index=nginx sourcetype="Nginx" host="" | rex "(?i)^(?:[^ ] ){3}(?P<fieldname>[^ ]+)" | top 50 FIELDNAME
This appears to work however I would like to extract the correct fields before the data is indexed ? ,
Dec 17 17:56:04 as1.br0.la.somecompany.com nginx: 68.232.40.28 - - [17/Dec/2012:17:56:04 -0800] GET/cdn/asset/view/client/partizan/package/library/id/300590/format/o/h/f0107484aa8d9bdf4eba080bc7c6a492/7a6013aef28682d61703dff120d21b12266b54a2a637283d7d4f0c0b4aa1f551916c39f5b8b23b8e8cb43d3055c3e48ed1864a6112 HTTP/1.1" 200 13333227 "-" "QuickTime/7.6.6 (qtver=7.6.6;cpu=IA32;os=Mac 10.6.8)" request_time 3.174 upstream_time 0.071
My inputs.conf file
host=as3.br0.la.somecompany.com
sourcetype=Nginx
source=/var/log/nginx-access.log
index=nginx
Thank you
Mark
Splunk already has a transformation that it uses to extract the host name from a syslog-formatted input, so you can use that.
Wherever you have your inputs.conf, create or edit a props.conf file so that it contains the following:
[Nginx]
TRANSFORMS-Nginxhost = syslog-host