Security

Squid For Splunk debugging parsing / search

bhu731
Explorer

I have installed squidforsplunk on splunk version 4.1.6, build 89596 on FreeBSD 8.1.

sample log line from squid

1296200057.055 19 lucas.mwrwin2k.se TCP_MISS/200 91754 GET http://material.svtplay.se/content/1/c8/02/29/43/77/antikmagasinet516.jpg - DIRECT/82.99.28.50 image/jpeg

If I search for the host or url it is not found, but if I do a search for sourcetype=squid, then the record is there.

This is true for many records.... any thoughts as to what the problem might be or how I can debug it?

Tags (1)
0 Karma
2 Solutions

Ayn
Legend

Did you search for all of the URL or just part of it. If the answer is the former, did you use wildcards for 'covering' the rest of the field value?

The reason I'm asking is that the request search form in SplunkforSquid doesn't currently use implicit wildcards - so if you search for, say, "svtplay" in the site field, fields with values such as "material.svtplay.se" will NOT match. You will have to search for *svtplay* instead.

If that's not the case, it sounds like there is some issue with field extraction. If you use the standard search interface, do you see fields such as "uri", "clientip" and "duration" show up? If they don't, something's fishy. I'll have a look at the sample event you provided and see if there's any problems.

Edit: So it seems the reason you're not getting any results is that your log format is different from the default Squid log format which is what SplunkforSquid expects. In the host field after the timestamp you're using the host's FQDN (lucas.mwrwin2k.se) instead of its numerical IP address. You could either change to the standard format in Squid's configuration, or modify the regex that SplunkforSquid uses to extract the fields. It's available in $SPLUNK_HOME/etc/apps/SplunkforSquid/default/transforms.conf.

The default transforms.conf looks like this:

[squid]
REGEX = ^\d+\.\d+\s+(\d+)\s+([0-9\.]*)\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:]*)://)?([^/:]+):?(\d+)?(/?[^ ]*))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.*)$
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

Override it by creating a transforms.conf in $SPLUNK_HOME/etc/apps/SplunkforSquid/local.

[squid]
REGEX = ^\d+\.\d+\s+(\d+)\s+([\w\d\.]*)\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:]*)://)?([^/:]+):?(\d+)?(/?[^ ]*))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.*)$
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

Note though that this means the field name will be a bit misleading - "clientip" will now contain FQDN records instead of IP addresses. This shouldn't affect the functionality in the app though, it's just a question of semantics.

View solution in original post

0 Karma

bhu731
Explorer

OK I take it all back! After clearing out the junk from the logfile..... It works fine.

It seems that originally this file was in http format... so of course no match, then http format was switched off and fully qualified domain logging switched on. So no surprise that it was not matching anything like 100%.

Thank you once again for your help.

View solution in original post

0 Karma

bhu731
Explorer

OK I take it all back! After clearing out the junk from the logfile..... It works fine.

It seems that originally this file was in http format... so of course no match, then http format was switched off and fully qualified domain logging switched on. So no surprise that it was not matching anything like 100%.

Thank you once again for your help.

0 Karma

Ayn
Legend

No problem. I'm authoring the SplunkforSquid app and any input at all is appreciated!

Please could you mark the question as solved, as the question will pop up as unanswered otherwise. Thanks!

0 Karma

Ayn
Legend

Did you search for all of the URL or just part of it. If the answer is the former, did you use wildcards for 'covering' the rest of the field value?

The reason I'm asking is that the request search form in SplunkforSquid doesn't currently use implicit wildcards - so if you search for, say, "svtplay" in the site field, fields with values such as "material.svtplay.se" will NOT match. You will have to search for *svtplay* instead.

If that's not the case, it sounds like there is some issue with field extraction. If you use the standard search interface, do you see fields such as "uri", "clientip" and "duration" show up? If they don't, something's fishy. I'll have a look at the sample event you provided and see if there's any problems.

Edit: So it seems the reason you're not getting any results is that your log format is different from the default Squid log format which is what SplunkforSquid expects. In the host field after the timestamp you're using the host's FQDN (lucas.mwrwin2k.se) instead of its numerical IP address. You could either change to the standard format in Squid's configuration, or modify the regex that SplunkforSquid uses to extract the fields. It's available in $SPLUNK_HOME/etc/apps/SplunkforSquid/default/transforms.conf.

The default transforms.conf looks like this:

[squid]
REGEX = ^\d+\.\d+\s+(\d+)\s+([0-9\.]*)\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:]*)://)?([^/:]+):?(\d+)?(/?[^ ]*))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.*)$
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

Override it by creating a transforms.conf in $SPLUNK_HOME/etc/apps/SplunkforSquid/local.

[squid]
REGEX = ^\d+\.\d+\s+(\d+)\s+([\w\d\.]*)\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:]*)://)?([^/:]+):?(\d+)?(/?[^ ]*))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.*)$
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

Note though that this means the field name will be a bit misleading - "clientip" will now contain FQDN records instead of IP addresses. This shouldn't affect the functionality in the app though, it's just a question of semantics.

0 Karma

bhu731
Explorer

ok, Thanks for that, but there still seems to be something a wee bit flakey with the regex, I cobbled together one of my own that succeeds for 90% of my log file. ( 750k records ) but I will do some more tests tonight and see if I can isolate which work and which dont. fyi squid-3.1.10 is the version. But thanks for prompt reply.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...