All Apps and Add-ons

Splunk for Blue Coat ProxySG: Using " " as a delimiter, how to redefine a field with the " character to prevent field extraction issues?

DavidHourani
Super Champion

Hello,

I have already put in place the Splunk for Bluecoat app and been using it for a while. I recently noticed some errors appearing on my reports and I think there is a problem with the extraction mechanism used.

In this app, we use " " as DELIM and it doesn't take into consideration that in some cases the URL and URI's contain the character " which breaks the entire log line and makes most fields overlap.

Any idea how I can redefine the URL field without having to redefine all the other fields ?

Best regards,
Hourani

0 Karma
1 Solution

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

View solution in original post

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

MOberschelp
Explorer

Can you post your regex here?

0 Karma

DavidHourani
Super Champion

Sure, I don't have all the fields extracted but here's a list of what i got out.

EXTRACT-time_taken,src_ip,code_retour,action,bytes_in,bytes_out,cs_method,dest_host = ^(?:[^:\n]*:){2}\d+\s+(?P<time_taken>\d+)\s+(?P<src_ip>[^ ]+)\s+(?P<code_retour>\d+)[^ \n]* (?P<action>[^ ]+)\s+(?P<bytes_in>[^ ]+)\s+(?P<bytes_out>\d+)\s+(?P<cs_method>[^ ]+)\s+\-\s+(?P<dest_host>[^ ]+)
EXTRACT-http_url = ^(?:[^\s]*\s){12}(?P<http_url>.+?(?=\s\-\s(\w|\-)+\s\-\s(\w|\-)+))
EXTRACT-src_user = \s\-\s(?P<src_user>(\w{6}|\-)?(?=\s\-\s))(?:[^\s]*\s){5}\-\s\-\s(?:[^\s]*\s\".+\"\s[^\s]*\s\".+\")\Z
EXTRACT-filter_result = \s\-\s\-\s(?P<filter_result>[^\s]+?)(?:\s\".+\"\s[^\s]+\s\".+\")\Z
EXTRACT-category = \s\-\s\-\s[^\s]+\s\"(?P<category>[^\"]+?)(?:\"\s[^\s]+\s\".+\")\Z
EXTRACT-proxy_name = \s\"(?P<proxy_name>px[^\"]+?)(?:\")\Z

This is a log example :

2015-04-15 01:02:03 1032 1.2.3.4 200 TCP_MISS userID 154 GET - test.com - http://test.com/thisisatest - - - - test.com HTTP/1.1 - - PROXIED "none" 1031 "proxyname"

0 Karma

MOberschelp
Explorer

Great. This looks similar to my logs. Do you had the same problem with the log header?
I tried to ignore them with a entry in props.conf but this won't work anyhow.

I have a sourcetype called "bcoat:proxysg" (I tried also with the bcoat_proxysg) but the stanza is not working.

[bcoat:proxysg]
TZ = Europe/Berlin
HEADER_FIELD_LINE_NUMBER = 2
CHECK_FOR_HEADER = true

0 Karma

DavidHourani
Super Champion

Yeap same problem with the header ^^ I simply wrote a SEDCMD to remove anything that doesn't start with the time field hence deleting all junk lines and headers:

SEDCMD-<class> = s/^(?!.*\d{4}-\d{2}-\d{2}.*\s).*//g

Let me know if it works for u.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...