After realizing the hostname of a Blue Coat appliance was at the end of the incoming events, we created a host name extraction within props and transforms of our modified Blue Coat TA to extract the correct x_bluecoat_appliance_name. We verified the RegEx work by testing with | rex field=_raw "OUR_REGEX" and by testing externally on regex101.com. Both were 100% successful in test.
When applied to the new incoming data, we experienced failure on all lengthy messages.
Sample data and Props/Transforms are below.
It appears that there is a maximum length on RegEx for index-time extractions?
props.conf
[bcoat_proxysg]
TRANSFORMS-hostchange = bluecoat_host
transforms.conf
[bluecoat_host]
#we tried both of these regexes - starting with the lookbehind ...
REGEX = \"(\S+)\"\s{1,3}\S+\s\S+(?<=$)
#then the one starting from the beginning of the message - as inefficient as it may be.
#REGEX = \S+\s+\S+\s+\S+\s+\S+\s+\S+\s\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\"(.+)\"\s+\S+\s+\S+
DEST_KEY = MetaData:Host
FORMAT = host::$1
REGEX = (?<date>\S+)\s+(?<time>\S+)\s+(?<time_taken>\S+)\s+(?<c_ip>\S+)\s+(?<sc_status>\S+)\s(?<s_action>\S+)\s+(?<sc_bytes>\S+)\s+(?<cs_bytes>\S+)\s+(?<cs_method>\S+)\s+(?<cs_uri_scheme>\S+)\s+(?<cs_host>\S+)\s+(?<cs_uri_port>\S+)\s+(?<cs_uri_path>\S+)\s+(?<cs_uri_query>\S+)\s+(?<cs_username>\S+)\s+(?<cs_auth_group>\S+)\s+(?<s_hierarchy>\S+)\s+(?<s_supplier_name>\S+)\s+(?<rs_content_type>\S+)\s+(?<cs_referer>\S+)\s+\"?(?<cs_user_agent>.+)\"?\s+(?<sc_filter_result>\S+)\s+\"(?<cs_categories>.+)\"\s+(?<x_virus_id>\S+)\s+(?<s_ip>\S+)\s+(?<c_port>\S+)\s+(?<x_exception_id>\S+)\s+\"(?<cs_category>.+)\"\s+(?<cs_uri_extension>\S+)\s+(?<cs_uri>\S+)\s+(?<s_sitename>\S+)\s+(?<r_ip>\S+)\s+(?<r_dns>\S+)\s+(?<s_session_id>\S+)\s+\"(?<x_bluecoat_appliance_name>.+)\"\s+(?<x_cache_info>\S+)\s+(?<x_rs_streaming_content>\S+)
FIELDS Listing from Headers
FIELDS="date","time","time_taken","c_ip","sc_status","s_action","sc_bytes","cs_bytes","cs_method","cs_uri_scheme","cs_host","cs_uri_port","cs_uri_path","cs_uri_query","cs_username","cs_auth_group","s_hierarchy","s_supplier_name","rs_content_type","cs_referer","cs_user_agent","sc_filter_result","cs_categories","x_virus_id","s_ip","c_port","x_exception_id","cs_category","cs_uri_extension","cs_uri","s_sitename","r_ip","r_dns","s_session_id","x_bluecoat_appliance_name","x_cache_info","x_rs_streaming_content"
sample data
SHORT MSG:
2015-06-05 12:14:47 44 8.8.8.8 200 TCP_HIT 1206 2324 GET http data.t.bleacherreport.com 80 /jsonp/MLB_Reg/Baseball/2015/6/5/e19d5d29-13ce-4f34-b3c3-c13814afe6da/line_scores.json ?callback=BRLineScore_isOver_76524 username - - 8.8.8.8 application/javascript http://bleacherreport.com/articles/2484112-the-miami-heat-in-surprising-showdown-simply-cant-afford-to-lose-dwyane-wade "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/7.0; NISSC)" OBSERVED "Sports/Recreation" - 4.4.4.4 62169 - "Sports/Recreation" json http://data.t.bleacherreport.com/jsonp/MLB_Reg/Baseball/2015/6/5/e19d5d29-13ce-4f34-b3c3-c13814afe6da/line_scores.json?callback=BRLineScore_isOver_76524 SG-HTTP-Service 157.166.249.67 data.t.bleacherreport.com - "bc-appliance-hostname" - -
LONG MSG:
2015-06-05 12:05:07 92 8.8.8.8 200 TCP_NC_MISS 715 2636 GET http webstats.americanbar.org 80 /b/ss/abajournalproduction/1/H.22.1/s24794675480276??AQB=1&ndh=1&t=5%2F5%2F2015%208%3A5%3A6%205%20240&ce=UTF-8&ns=americanbarassociation&g=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_law_prof_says%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&r=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judge_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&cc=USD&c1=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_law_prof_says%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c2=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judge_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c3=news&c4=article&c5=does_blackhawks_jersey_ban_violate_the_first_amendment_maybe_law_prof_says&c19=NOT%20SECURE&c20=NO%20404%20ERROR&c25=Not%20Logged%20In&c28=OTHER&c29=www.abajournal.com&c32=NON-MEMBER&c33=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_&c34=law_prof_says%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c36=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judg&c37=e_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c42=ABA%20Non-Store%20Page&c47=D%3Dip-address&s=1920x1200&c=24&j=1.6&v=Y&k=Y&bw=1920&bh=1009&p=Shockwave%20Flash%3BSilverlight%20Plug-In%3B&pid=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judge_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&oid=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_&ot=A&oi=625&AQE=1 megodm - - webstats.americanbar.org image/gif http://www.abajournal.com/news/article/does_blackhawks_jersey_ban_violate_the_first_amendment_maybe_law_prof_says/?utm_source=internal&utm_medium=navigation&utm_campaign=most_read "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; NISSC; rv:11.0) like Gecko" OBSERVED "Education;Government/Legal" - 8.8.8.8 59406 - "Education" - http://webstats.americanbar.org/b/ss/abajournalproduction/1/H.22.1/s24794675480276?AQB=1&ndh=1&t=5%2F5%2F2015%208%3A5%3A6%205%20240&ce=UTF-8&ns=americanbarassociation&g=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_law_prof_says%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&r=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judge_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&cc=USD&c1=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_law_prof_says%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c2=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judge_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c3=news&c4=article&c5=does_blackhawks_jersey_ban_violate_the_first_amendment_maybe_law_prof_says&c19=NOT%20SECURE&c20=NO%20404%20ERROR&c25=Not%20Logged%20In&c28=OTHER&c29=www.abajournal.com&c32=NON-MEMBER&c33=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_&c34=law_prof_says%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c36=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judg&c37=e_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&c42=ABA%20Non-Store%20Page&c47=D%3Dip-address&s=1920x1200&c=24&j=1.6&v=Y&k=Y&bw=1920&bh=1009&p=Shockwave%20Flash%3BSilverlight%20Plug-In%3B&pid=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fman_sued_for_sawing_neighbors_garage_in_half_isnt_liable_judge_rules%2F%3Futm_source%3Dinternal%26utm_medium%3Dnavigation%26utm_campaign%3Dmost_read&oid=http%3A%2F%2Fwww.abajournal.com%2Fnews%2Farticle%2Fdoes_blackhawks_jersey_ban_violate_the_first_amendment_maybe_&ot=A&oi=625&AQE=1 SG-HTTP-Service 4.4.4.4 webstats.americanbar.org - "bc-appliance-hostname" - -
... View more