With the ProxySG using the default "bcreportermain_v1" output, we found that in about 5% of our logs did not get any field extraction. We noted that when the "http_user_agent" was blank (represented by a hyphen), it was not quoted. This is normally a quoted field. So, we surmised that it might be a problem with the regex. Turns out we were correct.
In the line below, the hyphen just before "2.2.2.2" is supposed to be the http_user_agent... as you can see it's unquoted.
2015-12-02 14:38:17 84 1.1.1.1 - - - OBSERVED "Business/Economy" - 200 TCP_NC_MISS GET text/html;charset=UTF-8 http prod-app.enmetric.com 80 /Command-war/retrieve ?limit=5 - - 2.2.2.2 198 129 - "none" "none"
In the line below, you can clearly see the quoted User-Agent field preceding 4.4.4.4 ...
2015-12-02 14:38:17 1662 1.1.1.2 - - - OBSERVED "Web Ads/Analytics" - 200 TCP_NC_MISS GET image/gif http p.liadm.com 80 /imp ?s=5 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)" 4.4.4.4 478 982 - "none" "none"
Original transform for bcreporter_v1
(?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+\"(?<http_user_agent>[^\"]+)\"\s+(?<s_ip>[^\s]+)\s+(?<sc_bytes>[^\s]+)\s+(?<cs_bytes>[^\s]+)\s+\"?(?<x_virus_id>[^\"]+)\"?\s+\"(?<x_bluecoat_application_name>[^\"]+)\"\s+\"(?<x_bluecoat_application_operation>[^\"]+)\"
Here it is all by itself
\"(?<http_user_agent>[^\"]+)\"
Config for "bcreportermain_v1"
date time time-taken c-ip cs-username cs-auth-group x-exception-id sc-filter-result cs-categories cs(Referer) sc-status s-action cs-method rs(Content-Type) cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs-uri-extension cs(User-Agent) s-ip sc-bytes cs-bytes x-virus-id x-bluecoat-application-name x-bluecoat-application-operation
Not sure whether the field should be fixed so that it is always quoted or if the regex is bad... curious if anyone else has noticed this.
... View more