Solved: Combining URL fields in reporting

mikebrittain · ‎05-06-2010

I'm trying to build a report of slowest pages/scripts on our server based on times for serving those scripts. This will help us track down our worst performing scripts so we can do a bit of performance tuning.

The search I'm using looks like this:

source=".../access.log" | stats avg(response_time) by script_path | sort avg(response_time) desc

The problem with this report is that the top script paths listed include unique IDs, something like this:

/view/item/12345
/view/item/12346
/view/item/12347

I was thinking I could group these together by doing a regex on script_path to replace the digit portion with a single "0" so that the average of response_time is based on all of the similar URLs.

/view/item/0

Having trouble with the search syntax. Any help?

Johnvey · ‎05-06-2010

For quick and dirty processing, use an inline regex via the rex command. For example, if your URI path structure, in the field named script_path is usually something like:

/<group>/<class>/<object_id>

where you want to generate statistics based on /<group>/<class> and not <object_id>, then add:

source=".../access.log" | rex field=script_path "(?<script_class>(/[^/]+){1,2})"

to your search string. This will generate a new field called script_class that is only the first 2 segments of your URI path. You can then operate on script_class just like any other field, so to complete your original search string:

source=".../access.log" 
| rex field=script_path "(?<script_class>(/[^/]+){1,2})"
| stats avg(response_time) by script_class 
| sort avg(response_time) desc

You probably don't want to type this in every time you search, so you can add this permanently to your app via transforms so the field script_class is automatically extracted.

View solution in original post

Johnvey · ‎05-06-2010

For quick and dirty processing, use an inline regex via the rex command. For example, if your URI path structure, in the field named script_path is usually something like:

/<group>/<class>/<object_id>

where you want to generate statistics based on /<group>/<class> and not <object_id>, then add:

source=".../access.log" | rex field=script_path "(?<script_class>(/[^/]+){1,2})"

to your search string. This will generate a new field called script_class that is only the first 2 segments of your URI path. You can then operate on script_class just like any other field, so to complete your original search string:

source=".../access.log" 
| rex field=script_path "(?<script_class>(/[^/]+){1,2})"
| stats avg(response_time) by script_class 
| sort avg(response_time) desc

You probably don't want to type this in every time you search, so you can add this permanently to your app via transforms so the field script_class is automatically extracted.

mikebrittain · ‎05-07-2010

This is a good start. Unfortunately, most of our URLs are not this standardized.

It looks like "rex" will work using mode=sed.

bfaber · ‎05-06-2010

Perhaps you could generalize with field? I don't know if it matches your data, but when I come across something that looks like http:/url/path/here&some_junk&12345&blahblahblah, I often create a field that only extracts the http:/url/path/here so I can use that to report upon. Make sense?

mikebrittain · ‎05-07-2010

Sadly, our site URLs have pretty wide variations in format and that's not going to work for me.

Combining URL fields in reporting

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!