Splunk Search

Combining URL fields in reporting

mikebrittain
Explorer

I'm trying to build a report of slowest pages/scripts on our server based on times for serving those scripts. This will help us track down our worst performing scripts so we can do a bit of performance tuning.

The search I'm using looks like this:

source=".../access.log" | stats avg(response_time) by script_path | sort avg(response_time) desc

The problem with this report is that the top script paths listed include unique IDs, something like this:

/view/item/12345
/view/item/12346
/view/item/12347

I was thinking I could group these together by doing a regex on script_path to replace the digit portion with a single "0" so that the average of response_time is based on all of the similar URLs.

/view/item/0

Having trouble with the search syntax. Any help?

Tags (1)
1 Solution

Johnvey
Contributor

For quick and dirty processing, use an inline regex via the rex command. For example, if your URI path structure, in the field named script_path is usually something like:

/<group>/<class>/<object_id>

where you want to generate statistics based on /<group>/<class> and not <object_id>, then add:

source=".../access.log" | rex field=script_path "(?<script_class>(/[^/]+){1,2})"

to your search string. This will generate a new field called script_class that is only the first 2 segments of your URI path. You can then operate on script_class just like any other field, so to complete your original search string:

source=".../access.log" 
| rex field=script_path "(?<script_class>(/[^/]+){1,2})"
| stats avg(response_time) by script_class 
| sort avg(response_time) desc

You probably don't want to type this in every time you search, so you can add this permanently to your app via transforms so the field script_class is automatically extracted.

View solution in original post

Johnvey
Contributor

For quick and dirty processing, use an inline regex via the rex command. For example, if your URI path structure, in the field named script_path is usually something like:

/<group>/<class>/<object_id>

where you want to generate statistics based on /<group>/<class> and not <object_id>, then add:

source=".../access.log" | rex field=script_path "(?<script_class>(/[^/]+){1,2})"

to your search string. This will generate a new field called script_class that is only the first 2 segments of your URI path. You can then operate on script_class just like any other field, so to complete your original search string:

source=".../access.log" 
| rex field=script_path "(?<script_class>(/[^/]+){1,2})"
| stats avg(response_time) by script_class 
| sort avg(response_time) desc

You probably don't want to type this in every time you search, so you can add this permanently to your app via transforms so the field script_class is automatically extracted.

mikebrittain
Explorer

This is a good start. Unfortunately, most of our URLs are not this standardized.

It looks like "rex" will work using mode=sed.

0 Karma

bfaber
Communicator

Perhaps you could generalize with field? I don't know if it matches your data, but when I come across something that looks like http:/url/path/here&some_junk&12345&blahblahblah, I often create a field that only extracts the http:/url/path/here so I can use that to report upon. Make sense?

0 Karma

mikebrittain
Explorer

Sadly, our site URLs have pretty wide variations in format and that's not going to work for me.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...