I have this JSON file which has a first (header) row that is different from the body. Number of fields in header row is less than in body. However, two fields are common in all rows. Example:
{"project":"1.0","build":"20","milestone":"","platform":"","id":"A1"}
{"build":"20","area":"Area1","status":"Pass","owner":"Engr1","manager":"Mgr1","id":"A1"}
I tried to make 'project' from header to append to each of subsequent rows, then do 'stats count(status) by project, build, area'. Since build is common, I thought from other threads' advice to simply use it as the filter i.e. index=main build | stats count...
index=main* build | stats values(build) as one by project
-> this can combine project with build, as it's the header row anywayindex=main* build | stats count(eval(status="FAIL") as "status" by build, area
-> this is fine as it's all in the body anywayindex=main* build | stats count(eval(status="FAIL") as "status" by build, area, project
-> this gives me zero resultsIf instead I do a join, index=main* build | join id [search index=main* domain | dedup id] | stats count(eval(status="FAIL") as "status" by build, area, project
-> this gets me correct data for the min(build) only. I have three distinct values of build but only the smallest is returned. How do I do a correct join?
After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?
After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?
You didn't use dedup is the biggest thing.
If you data contains just two rows for each build and id combination (common fields), one header and one body, like your example here, try this
index=main* build | stats values(*) as * by id,build | stats count(eval(mvcount(mvfilter(match(status,"FAIL"))))) as "status" by build, area, project
My data is one header and one body (body has multiple rows), and id, build occurs in all rows, both header and body. When I tried this, every field of area was returned with "status"=1, even though their (original) status was not "FAIL". Why did that happen?
Because this stats
command is wrong
move dedup command because you join with Id and in you subsearch you dedup eliminate other one of the events with same id.
try this:
index=main* build | join id [search index=lint* domain ] | stats c(eval(status="FAIL") as "status" by build, area, project
This again produced an answer where only my min(build) results were displayed. But it would work if instead I had done: index=main* | join build_num [search index=main* platform] | stats c(eval(status="FAIL") as "status" by build, area, project
Your naming in the join is not consistent with what you did before. However, I think your problems with the join are
(1) deduping the id eliminates all but one of the events.
(2) by default, the events from the main pipeline can join with only one event from the subsearch. You may need to use the the max
option of the join
command.
But I would do it differently. The following should work IF the header row always appears prior to the body rows
NOTE: Edited to fix a problem with the original streamstats I wrote
index=main* build
| streamstats last(project) as lastproject current=f window=1 by build
| eval project = if(notnull(project),project,lastproject)
| where status=="FAIL"
| stats count as status by build, area, project
Yes, you're right, I edited my question for consistency. When I tried this, I could not get a statistics of stats count, but I could see the events filtered to those where status=="FAIL". However in these events, the project is blank as it wasn't appended. How can I append project to these events?
I didn't do the streamstats command properly - I edited my answer above and hope this will fix it!
Let me know if it does not work!
I edited 'notnull' to 'isnotnull'. It's the same. I can see the events up to
index=main* build
| streamstats last(project) as lastproject current=f window=1 by build
| eval project = if(isnotnull(project),project,lastproject)
| where status=="FAIL"
but when I view the search results at this query level, project field is empty. When I append | stats count as status by build, area, project
to the end, no results come out unless I remove project.
Another thing I don't understand about this query is that when I do
index=main* build
| streamstats last(project) as lastproject current=f window=1 by build | table lastproject
, I get no results. Why is that?
Crud, something is still not working to propagate the project field across the events. The reason that you get nothing with stats command is that the project does not exist in all events. lastproject should be there, but since it isn't, then something is still probably wrong with the streamstats...