Solved: Aggregating fields in JSON array

mbosse · ‎03-15-2016

I'm relatively new to Splunk queries. I have an event that contains JSON and within the JSON data is an array. There's some data about a web page request, then an array of resources that make up the page, with some timing data for each resource.

I'd like to get things like the resource with the longest duration for a given request, and the average duration across all resources for a given request.

Here's the data:

{
"ts":"2016-01-21T12:15:16.0451054-05:00",
"id":"a95dc052-de57-45a5-be8b-4eee1b7a39ec",
"url":"http://www.mysite.com/default.htm",
"res":
[
{
"r_tid":"a95dc052-de57-45a5-be8b-4eee1b7a39ec",
"r_nm":"http://www.mysite.com/file1.css",
"duration":8.2
},
{
"r_tid":"a95dc052-de57-45a5-be8b-4eee1b7a39ec",
"r_nm":"http://www.mysite.com/file2.css",
"duration":7.731
},
{
"r_tid":"a95dc052-de57-45a5-be8b-4eee1b7a39ec",
"r_nm":"http://www.mysite.com/file1.js",
"duration":16.909
}
]
}

And for max duration, something like this as the output:
r_nm, duration
http://www.mysite.com/file1.js, 16.909

and for the average, something like this:
url, average_duration
http://www.mysite.com/default.htm, 10.947

I've tried spath a number of different ways based on other posts in here, but can't seem to get the right syntax. Any help would be greatly appreciated. The t_id for each resource will always equal the id of the request, in case a join is needed to do this.

martin_mueller · ‎03-15-2016

Here's an idea:

| stats count | eval _raw = "{\"ts\": \"2016-01-21T12:15:16.0451054-05:00\",\"id\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"url\": \"http://www.mysite.com/default.htm\",\"res\": [{\"r_tid\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"r_nm\": \"http://www.mysite.com/file1.css\",\"duration\": 8.2}, {\"r_tid\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"r_nm\": \"http://www.mysite.com/file2.css\",\"duration\": 7.731}, {\"r_tid\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"r_nm\": \"http://www.mysite.com/file1.js\",\"duration\": 16.909}]}" 
| spath url | spath res{} output=res | mvexpand res | spath input=res

The first line generates a dummy event using your example, the second line extracts the url, splits the event per resource, and extracts the resource fields. (Side note: If you already have configured field extractions and see a url field you can drop the first spath.) After that, you can append one of these two to get your statistics:

| stats avg(duration) by url
| stats max(duration) by r_nm

View solution in original post

martin_mueller · ‎03-15-2016

Here's an idea:

| stats count | eval _raw = "{\"ts\": \"2016-01-21T12:15:16.0451054-05:00\",\"id\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"url\": \"http://www.mysite.com/default.htm\",\"res\": [{\"r_tid\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"r_nm\": \"http://www.mysite.com/file1.css\",\"duration\": 8.2}, {\"r_tid\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"r_nm\": \"http://www.mysite.com/file2.css\",\"duration\": 7.731}, {\"r_tid\": \"a95dc052-de57-45a5-be8b-4eee1b7a39ec\",\"r_nm\": \"http://www.mysite.com/file1.js\",\"duration\": 16.909}]}" 
| spath url | spath res{} output=res | mvexpand res | spath input=res

The first line generates a dummy event using your example, the second line extracts the url, splits the event per resource, and extracts the resource fields. (Side note: If you already have configured field extractions and see a url field you can drop the first spath.) After that, you can append one of these two to get your statistics:

| stats avg(duration) by url
| stats max(duration) by r_nm

mbosse · ‎03-18-2016

I did a little more digging and trouble shooting and found that during my import I was being presented with a warning on the entries that exceeded 10000 bytes. I determined the config setting that controls that and increased the truncate value in props.conf. Everything is working well now. Thanks again.

martin_mueller · ‎03-18-2016

Well done 🙂

mbosse · ‎03-18-2016

Oh - forgot to mention, if I use the dummy data approach and put exactly the same data with the 91 resources in the JSON array as was imported, it seems to work fine. So this tells me it's likely a limitation on the import, the indexing, or on the search (or maybe a combination). So again, thanks for your idea. I think what you proposed is a sound approach and now I have a different issue to work through.

mbosse · ‎03-18-2016

Thanks martin_mueller. When I used this on some real data, I had to add "| fields url, r_dur, r_nm" before the stats in order to actually get a value. I still don't fully understand when to include the fields clause, but that got what I needed. However, in the real data, on the one request event I'm validating with, there are 91 resources in the JSON array. For some reason, only the first 34 in the array are being used to calculate the average or the max. I'll likely post another question in here specific to that. Otherwise, I believe what you provided is working with the additional of the fields clause. Thank you.

paulwrussell · ‎05-22-2016

i still cant get real data to display anything other than a big count. didnt make a difference adding fields. what is feilds menat ot be doing in this scenario? why would eval behave different to real data?

Aggregating fields in JSON array

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!