Getting Data In

squid log format slightly changed- would it cause my problem?

ericsteed
Engager

I am running squid 3.1 with an almost stock logformat (I modified it to show the fully qualified name of the IP address instead of the IP). Here is the logformat directive in my squid.conf file:

logformat squid %ts.%03tu %6tr %>A %Ss/%03>Hs %<st %rm %ru %[un %Sh/%<A %mt

note the %>A and $a and %<a

I previously had this working but I didn't like the IP addresses showing up in the dashboard and I wanted to assign "names" via entries in /etc/hosts on the squid server so they would show up in the dashboard with more meaningful tags. Now it's saying it can't find a single entry in my log even though I have over 100,000 of them! Where should I start looking?

0 Karma
1 Solution

ericsteed
Engager

Ha.. I answered my own question. Here's what I came up with:

This is to accommodate a slightly altered log format from squid when processing in the SplunkforSquid addon app for Splunk. Normally the client IP is an actual IP address. I told Squid to output in FQDN which forces it to do a lookup against /etc/hosts and substitute friendly names for the IP addresses. However, splunk is looking for a specific type of data in the 2nd field (client IP). Note that in the squid output, the client IP would be considered to be in the 3rd field from a space delimited perspective (see sample log entry for explanation) but based on the REGEX, it's actually the second field. It doesn't find any results with the original REGEX so I had to change it as outlined below:

Sample squid log output (original logformat out of the box):
1400639582.187 14 192.168.1.210 TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

sample squid log output (modified to be more human friendly):
1400639582.187 14 laptop TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

/opt/splunk/etc/apps/SplunkforSquid/default/transforms.conf Original REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([0-9.])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

New REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([^/])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

Field format identifiers:
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

I hope this helps some other newbs like myself. I've just started to use splunk so I'm still getting used to the structure.

View solution in original post

0 Karma

ericsteed
Engager

Ha.. I answered my own question. Here's what I came up with:

This is to accommodate a slightly altered log format from squid when processing in the SplunkforSquid addon app for Splunk. Normally the client IP is an actual IP address. I told Squid to output in FQDN which forces it to do a lookup against /etc/hosts and substitute friendly names for the IP addresses. However, splunk is looking for a specific type of data in the 2nd field (client IP). Note that in the squid output, the client IP would be considered to be in the 3rd field from a space delimited perspective (see sample log entry for explanation) but based on the REGEX, it's actually the second field. It doesn't find any results with the original REGEX so I had to change it as outlined below:

Sample squid log output (original logformat out of the box):
1400639582.187 14 192.168.1.210 TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

sample squid log output (modified to be more human friendly):
1400639582.187 14 laptop TCP_MISS/200 2497 GET 192.168.1.10:8000/en-US/splunkd/__raw/servicesNS/-/-/search/jobs? - DIRECT/192.168.1.10 application/json

/opt/splunk/etc/apps/SplunkforSquid/default/transforms.conf Original REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([0-9.])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

New REGEX: v
REGEX = ^\d+.\d+\s+(\d+)\s+([^/])\s+([^/]+)/(\d+)\s+(\d+)\s+(\w+)\s+((?:([^:])://)?([^/:]+):?(\d+)?(/?[^ ]))\s+(\S+)\s+([^/]+)/([^ ]+)\s+(.)$

Field format identifiers:
FORMAT = duration::$1 clientip::$2 action::$3 http_status::$4 bytes::$5 method::$6 uri::$7 proto::$8 uri_host::$9 uri_port::$10 uri_path::$11 username::$12 hierarchy::$13 server_ip::$14 content_type::$15

I hope this helps some other newbs like myself. I've just started to use splunk so I'm still getting used to the structure.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...