Solved: Re: How to join a header of a JSON file to the res...

jalau9 · ‎01-03-2015

I have this JSON file which has a first (header) row that is different from the body. Number of fields in header row is less than in body. However, two fields are common in all rows. Example:

{"project":"1.0","build":"20","milestone":"","platform":"","id":"A1"}
{"build":"20","area":"Area1","status":"Pass","owner":"Engr1","manager":"Mgr1","id":"A1"}

I tried to make 'project' from header to append to each of subsequent rows, then do 'stats count(status) by project, build, area'. Since build is common, I thought from other threads' advice to simply use it as the filter i.e. index=main build | stats count...

index=main* build | stats values(build) as one by project -> this can combine project with build, as it's the header row anyway
index=main* build | stats count(eval(status="FAIL") as "status" by build, area -> this is fine as it's all in the body anyway
index=main* build | stats count(eval(status="FAIL") as "status" by build, area, project -> this gives me zero results

If instead I do a join, index=main* build | join id [search index=main* domain | dedup id] | stats count(eval(status="FAIL") as "status" by build, area, project -> this gets me correct data for the min(build) only. I have three distinct values of build but only the smallest is returned. How do I do a correct join?

jalau9 · ‎01-08-2015

After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?

View solution in original post

jalau9 · ‎01-08-2015

After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?

lguinn2 · ‎01-08-2015

You didn't use dedup is the biggest thing.

somesoni2 · ‎01-05-2015

If you data contains just two rows for each build and id combination (common fields), one header and one body, like your example here, try this

index=main* build | stats values(*) as * by id,build | stats count(eval(mvcount(mvfilter(match(status,"FAIL"))))) as "status" by build, area, project

jalau9 · ‎01-06-2015

My data is one header and one body (body has multiple rows), and id, build occurs in all rows, both header and body. When I tried this, every field of area was returned with "status"=1, even though their (original) status was not "FAIL". Why did that happen?

lguinn2 · ‎01-07-2015

Because this stats command is wrong

fdi01 · ‎01-05-2015

move dedup command because you join with Id and in you subsearch you dedup eliminate other one of the events with same id.
try this:

index=main* build | join id [search index=lint* domain ] | stats c(eval(status="FAIL") as "status" by build, area, project

jalau9 · ‎01-08-2015

This again produced an answer where only my min(build) results were displayed. But it would work if instead I had done: index=main* | join build_num [search index=main* platform] | stats c(eval(status="FAIL") as "status" by build, area, project

lguinn2 · ‎01-04-2015

Your naming in the join is not consistent with what you did before. However, I think your problems with the join are

(1) deduping the id eliminates all but one of the events.

(2) by default, the events from the main pipeline can join with only one event from the subsearch. You may need to use the the max option of the join command.

But I would do it differently. The following should work IF the header row always appears prior to the body rows

NOTE: Edited to fix a problem with the original streamstats I wrote

index=main* build 
| streamstats last(project) as lastproject current=f window=1 by build
| eval project = if(notnull(project),project,lastproject)
| where status=="FAIL"
| stats count as status by build, area, project

jalau9 · ‎01-06-2015

Yes, you're right, I edited my question for consistency. When I tried this, I could not get a statistics of stats count, but I could see the events filtered to those where status=="FAIL". However in these events, the project is blank as it wasn't appended. How can I append project to these events?

lguinn2 · ‎01-07-2015

I didn't do the streamstats command properly - I edited my answer above and hope this will fix it!
Let me know if it does not work!

jalau9 · ‎01-07-2015

I edited 'notnull' to 'isnotnull'. It's the same. I can see the events up to

index=main* build 
 | streamstats last(project) as lastproject current=f window=1 by build
 | eval project = if(isnotnull(project),project,lastproject)
 | where status=="FAIL"

but when I view the search results at this query level, project field is empty. When I append | stats count as status by build, area, project to the end, no results come out unless I remove project.

Another thing I don't understand about this query is that when I do

index=main* build 
     | streamstats last(project) as lastproject current=f window=1 by build | table lastproject

, I get no results. Why is that?

lguinn2 · ‎01-08-2015

Crud, something is still not working to propagate the project field across the events. The reason that you get nothing with stats command is that the project does not exist in all events. lastproject should be there, but since it isn't, then something is still probably wrong with the streamstats...

How to join a header of a JSON file to the rest of the rows by one common field?

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases