All Apps and Add-ons

Parsing syslog data for httpproxy data in Splunk For Squid

jminihane
New Member

I have just set up Splunk and am trying to get my http proxy (Astaro) data into Splunk for Squid. Astaro does use squid but the syslog data isn't in the standard squid format. I can get the syslog data into Splunk and see it via a new UDP:514 input, but I'm having trouble with getting the data visible in Splunk For Squid.
Here is a typical syslogd entry:

Mar 31 20:26:00 10.10.40.10 2012:03:31-20:25:50 httpproxy[14016]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="10.10.30.101" dstip="67.195.186.237" user="" statuscode="200" cached="0" profile="PROFILENAMEHERE" filteraction="DefaultHTTPAction" size="403" request="0x7cd39000" url="<URLISHERE>" exceptions="" error="" country="United States" category="122,157" reputation="neutral" categoryname="Instant Messaging,Web Phone" content-type="application/json"

Here are my props.conf contents.

[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = none
SHOULD_LINEMERGE = false
REPORT-squid = squid
[source::udp:514]
TRANSFORMS-sqsourcetype= sq_sourcetyper

and transforms.conf...

[squid]
REGEX =^d+.d+s+(d+)s+([0-9.])s+([^/]+)/(d+)s+(d+)s+(w+)s+((?:([^:])://)?([^/:]+):?(d+)?(/?[^]))s+(S+)s+([^/]+)/([^ ]+)s+(.)$
FORMAT = id::$1 severity::$2 sys::$3 sub::$4 name::$5 action::$6 method::$7 srcip::$8 dstip::$9 user::$10 statuscode::$11 cached::$12 profile::$13 filteraction::$14 size::$15 request::$16 url::$17 exceptions::$18 error::$19 country::$20 category::$21 reputation::$22 categoryname::$23 content-type::$24

[sq_sourcetyper] 
SOURCE_KEY = MetaData:Host 
REGEX = httpproxy
DEST_KEY = MetaData:Sourcetype 
FORMAT= sourcetype::squid

When adding a data source I can't see the "squid" sourcetype anywhere.

I'm guessing that my transforms.conf REGEX is wrong, but how do I get the data to show up in Splunk For Squid?
Markdown may have messed up the formatting.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

yes. almost certainly, your host (or MetaData:Host) value is not httpproxy, but instead 10.10.40.10. Unfortunately, this kind of chaining of timestamps and hostnames is an inherent problem with using syslog, which doesn't specify the host in the data itself. You can try putting that in there. If that's undesirable, you can try instead:

[sq_sourcetyper] 
REGEX = ^(?:\S+\s+){5}httpproxy
DEST_KEY = MetaData:Sourcetype 
FORMAT= sourcetype::squid

which instead looks for the matching host in the data.


Also, your squid rule is unnecessarily complicated and inflexible. Instead, use this in props.conf:

[squid]
TIME_FORMAT = %s.%3N
MAX_TIMESTAMP_LOOKAHEAD = 15
KV_MODE = auto
SHOULD_LINEMERGE = false

The REGEX is slower and more complicated, so instead of using that, the auto KV_MODE extracts name value pairs anyway. If that doesn't work for you for some reason, you could try keeping your original props.conf, but changing the transforms.conf to:

[squid]
DELIMS = " ", "="

but it should work with the simpler config.

Another thing to consider option would be to modify your Splunk input config:

[udp:514]
no_appending_timestamp = true

which will prevent Splunk from adding the extra timestamp and host to the data. If you do this, you should modify your raw matching regex to ^\S+\s+httpproxy, since you don't need to match on the extra components.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...