I am attempting to use the INDEXED_EXTRACTION = W3C configuration to pull logs from a Microsoft TMG server. I started with the isamonitor app that exists for ISA 2006 and built a new sourcetype on top of it for the TMG logs called tmgwebw3c (based on isawebw3c). The header of the W3C log looks as follows, with the fields line containing tab-separated values as does the data itself:
#Software: Microsoft Forefront Threat Management Gateway #Version: 2.0 #Date: 2013-11-22 15:26:48 #Fields: c-ip cs-username c-agent date time s-computername cs-referred r-host r-ip r-port time-taken sc-bytes cs-bytes cs-protocol s-operation cs-uri cs-mime-type s-object-source sc-status rule FilterInfo cs-network sc-network error-info action AuthenticationServer NIS scan result NIS signature ThreatName MalwareInspectionAction MalwareInspectionResult UrlCategory MalwareInspectionContentDeliveryMethod MalwareInspectionDuration MalwareInspectionThreatLevel internal-service-info NIS application protocol NAT address UrlCategorizationReason SessionType UrlDestHost s-port SoftBlockAction
Using the documentation at http://docs.splunk.com/Documentation/Splunk/6.0/Data/Extractfieldsfromfileheadersatindextime I built a sourcetype that looks as follows:
[tmgwebw3c] MAX_TIMESTAMP_LOOKAHEAD = 32 SHOULD_LINEMERGE = false REPORT-tmgwebw3c = tmgwebw3c TZ = GMT INDEXED_EXTRACTIONS = W3C FIELD_HEADER_REGEX = ^#Fields: PREAMBLE_REGEX = ^#\w+: FIELD_DELIMITER = \t
Everything appears to be working well, but for the very first field, it is being named "Fields_c_ip" rather than the expected "c_ip". Based on the documentation, FIELD_HEADER_REGEX should not include the matched portion as part of the header line, but it seems to be doing so anyhow.
I tried to remove the PREAMBLE_REGEX also in case they were conflicting, but this did not solve the issue. Any assistance with this would be appreciated.
--
Brian T Glenn
Hurricane Labs
Turns out this is going to be fixed post-6.0.2. Nothing to do in the configuration itself.
Turns out this is going to be fixed post-6.0.2. Nothing to do in the configuration itself.
My props.conf settings that works with this:
[w3c_tab]
FIELD_DELIMITER=tab
FIELD_HEADER_REGEX=^#Fields:\s*(.*)
MISSING_VALUE_REGEX=-
TIME_FORMAT=%Y-%m-%d %H:%M:%S
TZ=GMT
TIMESTAMP_FIELDS=date,time
Note I had accidentally escaped the \s in FIELD_HEADER_REGEX
I will definitely give this a shot, but I will not have access to the environment until the end of next month, so I can't be sure just yet. Thanks!
Hi Brian,
Did you try just:
INDEXED_EXTRACTIONS = W3C
Without any other settings? This actually sets the following under the covers:
FIELD_DELIMITER = whitespace
FIELD_HEADER_REGEX = ^#Fields:\\s*(.*)
MISSING_VALUE_REGEX = -
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = GMT
TIMESTAMP_FIELDS = date,time
We did have some trouble with tabs and spaces in Internet Security and Acceleration Server and I'm wondering if we'll see the same problems here.
I will definitely give this a shot, but I will not have access to the environment until the end of next month, so I can't be sure just yet. Thanks!