Hello,
I have already put in place the Splunk for Bluecoat app and been using it for a while. I recently noticed some errors appearing on my reports and I think there is a problem with the extraction mechanism used.
In this app, we use " " as DELIM and it doesn't take into consideration that in some cases the URL and URI's contain the character " which breaks the entire log line and makes most fields overlap.
Any idea how I can redefine the URL field without having to redefine all the other fields ?
Best regards,
Hourani
I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.
I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.
Can you post your regex here?
Sure, I don't have all the fields extracted but here's a list of what i got out.
EXTRACT-time_taken,src_ip,code_retour,action,bytes_in,bytes_out,cs_method,dest_host = ^(?:[^:\n]*:){2}\d+\s+(?P<time_taken>\d+)\s+(?P<src_ip>[^ ]+)\s+(?P<code_retour>\d+)[^ \n]* (?P<action>[^ ]+)\s+(?P<bytes_in>[^ ]+)\s+(?P<bytes_out>\d+)\s+(?P<cs_method>[^ ]+)\s+\-\s+(?P<dest_host>[^ ]+)
EXTRACT-http_url = ^(?:[^\s]*\s){12}(?P<http_url>.+?(?=\s\-\s(\w|\-)+\s\-\s(\w|\-)+))
EXTRACT-src_user = \s\-\s(?P<src_user>(\w{6}|\-)?(?=\s\-\s))(?:[^\s]*\s){5}\-\s\-\s(?:[^\s]*\s\".+\"\s[^\s]*\s\".+\")\Z
EXTRACT-filter_result = \s\-\s\-\s(?P<filter_result>[^\s]+?)(?:\s\".+\"\s[^\s]+\s\".+\")\Z
EXTRACT-category = \s\-\s\-\s[^\s]+\s\"(?P<category>[^\"]+?)(?:\"\s[^\s]+\s\".+\")\Z
EXTRACT-proxy_name = \s\"(?P<proxy_name>px[^\"]+?)(?:\")\Z
This is a log example :
2015-04-15 01:02:03 1032 1.2.3.4 200 TCP_MISS userID 154 GET - test.com - http://test.com/thisisatest - - - - test.com HTTP/1.1 - - PROXIED "none" 1031 "proxyname"
Great. This looks similar to my logs. Do you had the same problem with the log header?
I tried to ignore them with a entry in props.conf but this won't work anyhow.
I have a sourcetype called "bcoat:proxysg" (I tried also with the bcoat_proxysg) but the stanza is not working.
[bcoat:proxysg]
TZ = Europe/Berlin
HEADER_FIELD_LINE_NUMBER = 2
CHECK_FOR_HEADER = true
Yeap same problem with the header ^^ I simply wrote a SEDCMD to remove anything that doesn't start with the time field hence deleting all junk lines and headers:
SEDCMD-<class> = s/^(?!.*\d{4}-\d{2}-\d{2}.*\s).*//g
Let me know if it works for u.