I am trying to use the Splunk for AWS app to monitor cisco web logs files but am having a very hard time extracting the fields. The files are tab separated values and are created approximately every half hour, so I should theoretically be able to use the header as the field values, but cannot do so while monitoring, only when manually uploading in a test environment. The problem I am finding with manual extraction is not every field is filled for all events. I attempted to use the following settings in the props.conf file but they are only pulling the header line.
Sample logs here : http://pastebin.com/5t7xjt41
[aws_s3]
FIELD_DELIMITER = tab
HEADER_FIELD_DELIMITER = tab
FIELD_NAMES = "datatime","c-ip","cs(X-Forwarded-For)","cs-username","cs-method","cs-uri-scheme","cs-host","cs-uri-port","cs-uri-path","cs-uri-query","cs(User-Agent)","cs(Content-Type)","cs-bytes","sc-bytes","sc-status","sc(Content-Type)","s-ip","x-ss-category","x-ss-last-rule-name","x-ss-last-rule-action","x-ss-block-type","x-ss-block-value","x-ss-external-ip","x-ss-referer-host"
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
disabled = false
BREAK_ONLY_BEFORE_DATE = true
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true
Thomas, did this come up because of the log data you're getting or because the data (proxy WC3 logs) are there but the field extraction is failing? When you mentioned Cisco proxy I'm assuming you're using Cisco CWS / Scansafe?
Almost all of the log data I get (using sourcetype = aws:s3) from Scansafe via AWS is:
#Fields: datatime c-ip cs(X-Forwarded-For) cs-username cs-method cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs(User-Agent) cs(Content-Type) cs-bytes sc-bytes sc-status sc(Content-Type) s-ip x-ss-category x-ss-last-rule-name x-ss-last-rule-action x-ss-block-type x-ss-block-value x-ss-external-ip x-ss-referer-host
I don't see much more than extra "useless" stuff but try this:
[aws_s3]
FIELD_DELIMITER = \t
HEADER_FIELD_DELIMITER = \t
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
sourcetype = cws:proxy
TZ = EST
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = false
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
pulldown_type = true
http://docs.splunk.com/Documentation/Splunk/6.2.3/Data/Extractfieldsfromfileheadersatindextime
Thanks, but I have tried this, not sure why it isn't working. I ended up just using a regex extraction in the props.conf file.
EXTRACT-datatime,c_ip,cs_X_Forwarded_For,cs_username,cs_method,cs_uri_scheme,cs_host,cs_uri_port,cs_uri_path,cs_uri_query,cs_user_agent,cs_content_type,cs_bytes,sc_bytes,sc_status,sc_content_type,s_ip,x_ss_category,x_ss_last_rule_name,x_ss_last_rule_action,x_ss_block_type,x_ss_block_value,x_ss_external_ip,x_ss_referer_host = ^(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)\t(?P[\t]+)$
Are you using Splunk or Hunk? If you're using Splunk you need to place the configs (in your original question) on an indexer. What you've commented above does search time field extractions and lives in the search head and should work in Splunk and Hunk.