Splunk Search

Combining URL fields in reporting

mikebrittain
Explorer

I'm trying to build a report of slowest pages/scripts on our server based on times for serving those scripts. This will help us track down our worst performing scripts so we can do a bit of performance tuning.

The search I'm using looks like this:

source=".../access.log" | stats avg(response_time) by script_path | sort avg(response_time) desc

The problem with this report is that the top script paths listed include unique IDs, something like this:

/view/item/12345
/view/item/12346
/view/item/12347

I was thinking I could group these together by doing a regex on script_path to replace the digit portion with a single "0" so that the average of response_time is based on all of the similar URLs.

/view/item/0

Having trouble with the search syntax. Any help?

Tags (1)
1 Solution

Johnvey
Contributor

For quick and dirty processing, use an inline regex via the rex command. For example, if your URI path structure, in the field named script_path is usually something like:

/<group>/<class>/<object_id>

where you want to generate statistics based on /<group>/<class> and not <object_id>, then add:

source=".../access.log" | rex field=script_path "(?<script_class>(/[^/]+){1,2})"

to your search string. This will generate a new field called script_class that is only the first 2 segments of your URI path. You can then operate on script_class just like any other field, so to complete your original search string:

source=".../access.log" 
| rex field=script_path "(?<script_class>(/[^/]+){1,2})"
| stats avg(response_time) by script_class 
| sort avg(response_time) desc

You probably don't want to type this in every time you search, so you can add this permanently to your app via transforms so the field script_class is automatically extracted.

View solution in original post

Johnvey
Contributor

For quick and dirty processing, use an inline regex via the rex command. For example, if your URI path structure, in the field named script_path is usually something like:

/<group>/<class>/<object_id>

where you want to generate statistics based on /<group>/<class> and not <object_id>, then add:

source=".../access.log" | rex field=script_path "(?<script_class>(/[^/]+){1,2})"

to your search string. This will generate a new field called script_class that is only the first 2 segments of your URI path. You can then operate on script_class just like any other field, so to complete your original search string:

source=".../access.log" 
| rex field=script_path "(?<script_class>(/[^/]+){1,2})"
| stats avg(response_time) by script_class 
| sort avg(response_time) desc

You probably don't want to type this in every time you search, so you can add this permanently to your app via transforms so the field script_class is automatically extracted.

mikebrittain
Explorer

This is a good start. Unfortunately, most of our URLs are not this standardized.

It looks like "rex" will work using mode=sed.

0 Karma

bfaber
Communicator

Perhaps you could generalize with field? I don't know if it matches your data, but when I come across something that looks like http:/url/path/here&some_junk&12345&blahblahblah, I often create a field that only extracts the http:/url/path/here so I can use that to report upon. Make sense?

0 Karma

mikebrittain
Explorer

Sadly, our site URLs have pretty wide variations in format and that's not going to work for me.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...