Solved: How to find individual averages of each stages{}.d...

joelwizard · ‎05-08-2023

I have some SPL that generates a table that looks like this for several builds of a job:

Prepare	1.003
Execute Test	44.544
Collate Results	556.44
Post	23.33

And it outputs this for each build that matches the SPL query. I want to be able to calculate the average time elapsed per stages{}.name to each stages{}.duration returned. When I use avg(stages{}.duration), it seems to average over all of the results in a way that isn't coherent.

For instance, I want to display a bar chart that gives you a chart of the following table:

Prepare	<Average of Prepare stage>
Execute Test	<Average of Execute Test stage>
Collate Results	<Average of Collate Results stage>
Post	<Average of Post stage>

yuanliu · ‎05-10-2023

You do not have to share proprietary events. But corporate guidelines will not prevent you from anonymizing data. For volunteers to help, you should illustrate key features/structure/format of your data, replacing every string if needed.

Based one what you have divulged so far, I deduce that you have something like

{"stage": [{"name":"Prepare", "duration": 1.003}, {"name":"Execute Test", "duration": 44.544}, {"name":"Collate Results", "duration": 556.44}, {"name":"Post", "duration": 23.33}]}

If you illustrate your data like this, I don't think your corporate tzar will mind. You would have saved volunteers tons of time reading mind.

I am not sure what you mean by "mvexpand seems to do nothing." If your raw event is anything like the above, spath + mvexpand is exactly the pathway to solution.

| spath path=stage{}
| mvexpand stage{}
| spath input=stage{}
| stats avg(duration) as avg_duration by name

Using that single data point, the output is

name	avg_duration
Collate Results	556.44
Execute Test	44.544
Post	23.33
Prepare	1.003

Below is a data emulation you can play with and compare with real data.

| makeresults
| eval _raw = "{\"stage\": [{\"name\":\"Prepare\", \"duration\": 1.003},
{\"name\":\"Execute Test\", \"duration\": 44.544},
{\"name\":\"Collate Results\", \"duration\": 556.44},
{\"name\":\"Post\", \"duration\": 23.33}]}"
``` data emulation above ```

View solution in original post

joelwizard · ‎05-09-2023

The SPL is returning multiple jobs, yet the stats of the avg(stages{}.duration) is that the average is the same for each stage.

yuanliu · ‎05-08-2023

Can you illustrate your data and results, and explain what is the way that isn't coherent? Maybe you have syntax difficulty, for example, you should be using avg('stages{}.duration') instead?

joelwizard · ‎05-09-2023

I am using avg(stages{}.duration) and the averages aren't making sense. They are all in the same range when I know there are stages that take very little time on average.

ITWhisperer · ‎05-08-2023

Without knowing what your actual events look like, it is not easy to suggest a solution.

Having said that, I am going to guess that your events contain more than one stage, each with a name and a duration. What you should try to do is split the event into multiple events each with just one stage. You may be able to do with with spath (to extract at the stages level), and mvexpand (to create an event for each stage), then use stats to calculation your averages.

If this isn't enough to help you solve your issue, please be more specific and share your events and the SPL you have already tried.

joelwizard · ‎05-09-2023

My jobs have multiple nested stages{} each with a name and a duration and children. Doing this:

spath path=stages{} output=extracted_stages

Gets me a new field with a block of stages. mvexpand seems to do nothing. If I generate a table with just the extracted_stages, I see the same table. Unfortunately, I can't really post the full copy/pasta because of corporate guidelines.

In my current example, I just want the duration of the "top-level" stages{}, not their children. I think the avg(stages{}.duration) is non-sensical, because I have an initial stage "Prepare" that takes never more than a couple of seconds, yet when I display the average it is in a range result that is similar to all the other stages, which makes me think that somehow the average may be getting coalesced in a way that doesn't make sense.

yuanliu · ‎05-10-2023

You do not have to share proprietary events. But corporate guidelines will not prevent you from anonymizing data. For volunteers to help, you should illustrate key features/structure/format of your data, replacing every string if needed.

Based one what you have divulged so far, I deduce that you have something like

{"stage": [{"name":"Prepare", "duration": 1.003}, {"name":"Execute Test", "duration": 44.544}, {"name":"Collate Results", "duration": 556.44}, {"name":"Post", "duration": 23.33}]}

If you illustrate your data like this, I don't think your corporate tzar will mind. You would have saved volunteers tons of time reading mind.

I am not sure what you mean by "mvexpand seems to do nothing." If your raw event is anything like the above, spath + mvexpand is exactly the pathway to solution.

| spath path=stage{}
| mvexpand stage{}
| spath input=stage{}
| stats avg(duration) as avg_duration by name

Using that single data point, the output is

name	avg_duration
Collate Results	556.44
Execute Test	44.544
Post	23.33
Prepare	1.003

Below is a data emulation you can play with and compare with real data.

| makeresults
| eval _raw = "{\"stage\": [{\"name\":\"Prepare\", \"duration\": 1.003},
{\"name\":\"Execute Test\", \"duration\": 44.544},
{\"name\":\"Collate Results\", \"duration\": 556.44},
{\"name\":\"Post\", \"duration\": 23.33}]}"
``` data emulation above ```

How to find individual averages of each stages{}.duration by stages.name{} in Splunk for Jenkins app?

chart

eval

stats

table

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio