I run HAProxy and grab it via a universal forwarder and send it to our receiver/indexer (all on same host).
I modified my props.conf as follows.
props.conf
[source::/var/log/*haproxy.log]
TRANSFORMS-syslogstripper = haproxy_syslog_stripper, haproxyfields, clientinfofields, backendfields, requestinfo, connectioninfo, queueinfo, uriinfo
[sourcetype::HAProxy]
MAX_TIMESTAMP_LOOKAHEAD=40
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TZ = US/Mountain
REPORT-haproxyfieldextract = haproxyfields, clientinfofields, backendfields, requestinfo, connectioninfo, queueinfo, uriinfo
TRANSFORMS-haproxystuff = haproxyfields
Here is my transforms.conf where I listed pertinent HAProxy info
transforms.conf
# This will strip the syslog header (date stamp and host) from a syslog event
[haproxy_syslog_stripper]
REGEX = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s(.*)$
FORMAT = $1
DEST_KEY = _raw
# Transform for HAProxy
[haproxyfields]
DELIMS = " "
FIELDS = haproxy_id,client_info, date_time,frontend_name,backend,request_info,status_code,response_size,val1,val2,flags,connection_info,queue_info,req_header,resp_header,method,uri_info
CLEAN_KEYS=true
#the following is used to extract values from the previous extraction
[clientinfofields]
SOURCE_KEY=client_info
DELIM = ":"
FIELDS = client_ip,client_port
[backendfields]
SOURCE_KEY=backend
DELIM = "/"
FIELDS = backend_name,server_name
[requestinfo]
SOURCE_KEY=request_info
DELIM= "/"
FIELDS=request_time,queue_time,connection_time,response_time,total_time
[connectioninfo]
SOURCE_KEY=connection_info
DELIM= "/"
FIELDS=process_connections,frontend_connections,backend_connections,server_connections,retries
[queueinfo]
SOURCE_KEY=queue_info
DELIM= "/"
FIELDS=server_queue_size,backend_queue_size
#You can still use regex on those extraction that still need it.
[uriinfo]
SOURCE_KEY=uri_info
REGEX=(?[^"]+?)
I am able to get the fields listed in haproxyfields stanza to extract using this search term:
sourcetype="HAProxy" | extract haproxyfields
A hybrid approach seems to work. Too many special characters to escape, so I posted an image for props.conf.
transforms.conf:
[tmf_fields]
DELIMS=" "
FIELDS = month, day, day1, time1, source_ip, haproxy_id, client_info, date_time, frontend_name, backend, request_info, status_code, response_size, val1, val2, flags, connection_info
props.conf
A hybrid approach seems to work. Too many special characters to escape, so I posted an image for props.conf.
transforms.conf:
[tmf_fields]
DELIMS=" "
FIELDS = month, day, day1, time1, source_ip, haproxy_id, client_info, date_time, frontend_name, backend, request_info, status_code, response_size, val1, val2, flags, connection_info
props.conf
This seems to have worked for HAProxy. Keep in mind that the FIELDS arguments depicted above include the syslog header fields. I personally removed them since I strip the header beforehand.
Thanks Dave!
I will try to get to this late this week or weekend. I am traveling until Friday so I won't have a lot of time to reconfigure this this week to test this out. Keep the suggestions coming though. I do like the idea of not having to throw | extract into the mix though.
There are a couple of things going on in this setup: First, we need to clarify what is happening at index time, and what is happening at search time. It's also important to note that you really can't have extractions dependent on other extractions, as they don't execute in sequence.
Now, first thing I notice is you have index-time transforms being applied to the source stanza, and then timestamp, linemerge, and TZ fields being applied by sourcetype. While they should get mashed together correctly, I'd highly recommend getting them in the stanza if possible.
[source::/var/log/*haproxy.log]
MAX_TIMESTAMP_LOOKAHEAD=40
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TZ=US/Mountain
SOURCETPYE=HAProxy (unless this is explicitly set by the forwarder, in which case it's unnecessary, and you can make this entire stanza [HAProxy])
TRANSFORMS-syslogstripper = haproxy_syslog_stripper
EXTRACT-haproxy_fields = haproxy_fields
And then in transforms.conf you'll have the following:
[haproxy_syslog_stripper]
REGEX = ^[A-Z][a-z]+\s+\d+\s\d+:\d+:\d+\s[^\s]*\s(.*)$
FORMAT = $1
DEST_KEY = _raw
[haproxy_fields]
REGEX = SEE BELOW
Now, because you can't have extractions dependent on extractions (the field has to exist at search time, and if it's another search-time extraction, it doesn't) you're going to need a BIG regex to extract all of the fields. Assuming your HAProxy logs follow the this format after your syslog headers are removed...
[06/Feb/2009:12:14:14.655] http-in static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} {} "GET /index.html HTTP/1.1"
Then you could use something like this: regexr link because formatting gets borked inline
It's really long and not exactly easy to read, but it does pull out all of the fields you're looking for.
End result is the haproxy_syslog_stripper is an index-time extraction that overwrites _raw with it's results. Then haproxy_fields is a search-time extraction based on the updated _raw. One happens when the data is indexed, and the other happens when the data is searched. So in that case they can rely on each other. Bonus gained is that you shouldn't need to use the | extract command to get the fields to appear. They should simply be available when you're searching this sourcetype.
Thanks for the clarification! I thought everything in props essentially ran concurrently.
That being the case, this should work in props.conf (either the source:: stanza name or below depening on your incomind data)
[HAProxy]
MAX_TIMESTAMP_LOOKAHEAD=40
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TZ = US/Mountain
TRANSFORMS-syslogstripper = haproxy_syslog_stripper
REPORT-haproxyfieldextract = haproxyfields, clientinfofields, backendfields, requestinfo, connectioninfo, queueinfo, uriinfo
If this doesn't work, can you provide example logs?
A comment regarding the statement - "It's also important to note that you really can't have extractions dependent on other extractions, as they don't execute in sequence."
This is not true - in fact, the opposite holds and is used by many apps, including apps made by Splunk themselves. Extractions run in the sequence specified by the order in which they're called in a REPORT statement, so if you have "REPORT = extraction1, extraction2"
, extraction2 will run after extraction1 and can make use of the field(s) extraction1 created.
what are you trying to accomplish?