All Apps and Add-ons

Splunk for Blue Coat ProxySG: Using " " as a delimiter, how to redefine a field with the " character to prevent field extraction issues?

DavidHourani
Super Champion

Hello,

I have already put in place the Splunk for Bluecoat app and been using it for a while. I recently noticed some errors appearing on my reports and I think there is a problem with the extraction mechanism used.

In this app, we use " " as DELIM and it doesn't take into consideration that in some cases the URL and URI's contain the character " which breaks the entire log line and makes most fields overlap.

Any idea how I can redefine the URL field without having to redefine all the other fields ?

Best regards,
Hourani

0 Karma
1 Solution

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

View solution in original post

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

MOberschelp
Explorer

Can you post your regex here?

0 Karma

DavidHourani
Super Champion

Sure, I don't have all the fields extracted but here's a list of what i got out.

EXTRACT-time_taken,src_ip,code_retour,action,bytes_in,bytes_out,cs_method,dest_host = ^(?:[^:\n]*:){2}\d+\s+(?P<time_taken>\d+)\s+(?P<src_ip>[^ ]+)\s+(?P<code_retour>\d+)[^ \n]* (?P<action>[^ ]+)\s+(?P<bytes_in>[^ ]+)\s+(?P<bytes_out>\d+)\s+(?P<cs_method>[^ ]+)\s+\-\s+(?P<dest_host>[^ ]+)
EXTRACT-http_url = ^(?:[^\s]*\s){12}(?P<http_url>.+?(?=\s\-\s(\w|\-)+\s\-\s(\w|\-)+))
EXTRACT-src_user = \s\-\s(?P<src_user>(\w{6}|\-)?(?=\s\-\s))(?:[^\s]*\s){5}\-\s\-\s(?:[^\s]*\s\".+\"\s[^\s]*\s\".+\")\Z
EXTRACT-filter_result = \s\-\s\-\s(?P<filter_result>[^\s]+?)(?:\s\".+\"\s[^\s]+\s\".+\")\Z
EXTRACT-category = \s\-\s\-\s[^\s]+\s\"(?P<category>[^\"]+?)(?:\"\s[^\s]+\s\".+\")\Z
EXTRACT-proxy_name = \s\"(?P<proxy_name>px[^\"]+?)(?:\")\Z

This is a log example :

2015-04-15 01:02:03 1032 1.2.3.4 200 TCP_MISS userID 154 GET - test.com - http://test.com/thisisatest - - - - test.com HTTP/1.1 - - PROXIED "none" 1031 "proxyname"

0 Karma

MOberschelp
Explorer

Great. This looks similar to my logs. Do you had the same problem with the log header?
I tried to ignore them with a entry in props.conf but this won't work anyhow.

I have a sourcetype called "bcoat:proxysg" (I tried also with the bcoat_proxysg) but the stanza is not working.

[bcoat:proxysg]
TZ = Europe/Berlin
HEADER_FIELD_LINE_NUMBER = 2
CHECK_FOR_HEADER = true

0 Karma

DavidHourani
Super Champion

Yeap same problem with the header ^^ I simply wrote a SEDCMD to remove anything that doesn't start with the time field hence deleting all junk lines and headers:

SEDCMD-<class> = s/^(?!.*\d{4}-\d{2}-\d{2}.*\s).*//g

Let me know if it works for u.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...