Splunk Search

How to join a header of a JSON file to the rest of the rows by one common field?

jalau9
Explorer

I have this JSON file which has a first (header) row that is different from the body. Number of fields in header row is less than in body. However, two fields are common in all rows. Example:

{"project":"1.0","build":"20","milestone":"","platform":"","id":"A1"}
{"build":"20","area":"Area1","status":"Pass","owner":"Engr1","manager":"Mgr1","id":"A1"}

I tried to make 'project' from header to append to each of subsequent rows, then do 'stats count(status) by project, build, area'. Since build is common, I thought from other threads' advice to simply use it as the filter i.e. index=main build | stats count...

  1. index=main* build | stats values(build) as one by project -> this can combine project with build, as it's the header row anyway
  2. index=main* build | stats count(eval(status="FAIL") as "status" by build, area -> this is fine as it's all in the body anyway
  3. index=main* build | stats count(eval(status="FAIL") as "status" by build, area, project -> this gives me zero results

If instead I do a join, index=main* build | join id [search index=main* domain | dedup id] | stats count(eval(status="FAIL") as "status" by build, area, project -> this gets me correct data for the min(build) only. I have three distinct values of build but only the smallest is returned. How do I do a correct join?

Tags (4)
0 Karma
1 Solution

jalau9
Explorer

After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?

View solution in original post

jalau9
Explorer

After much hair-tearing, reading Splunk blogs, documentation on stats, streamstats, etc, I found this query to work: index=main* | join build [search index=main* platform] | where status="FAIL" | stats count(status) by build, project, area
I find it so little different from my 4th approach, how did it not work previously?

lguinn2
Legend

You didn't use dedup is the biggest thing.

0 Karma

somesoni2
Revered Legend

If you data contains just two rows for each build and id combination (common fields), one header and one body, like your example here, try this

index=main* build | stats values(*) as * by id,build | stats count(eval(mvcount(mvfilter(match(status,"FAIL"))))) as "status" by build, area, project
0 Karma

jalau9
Explorer

My data is one header and one body (body has multiple rows), and id, build occurs in all rows, both header and body. When I tried this, every field of area was returned with "status"=1, even though their (original) status was not "FAIL". Why did that happen?

0 Karma

lguinn2
Legend

Because this stats command is wrong

0 Karma

fdi01
Motivator

move dedup command because you join with Id and in you subsearch you dedup eliminate other one of the events with same id.
try this:

index=main* build | join id [search index=lint* domain ] | stats c(eval(status="FAIL") as "status" by build, area, project

0 Karma

jalau9
Explorer

This again produced an answer where only my min(build) results were displayed. But it would work if instead I had done: index=main* | join build_num [search index=main* platform] | stats c(eval(status="FAIL") as "status" by build, area, project

0 Karma

lguinn2
Legend

Your naming in the join is not consistent with what you did before. However, I think your problems with the join are

(1) deduping the id eliminates all but one of the events.

(2) by default, the events from the main pipeline can join with only one event from the subsearch. You may need to use the the max option of the join command.

But I would do it differently. The following should work IF the header row always appears prior to the body rows

NOTE: Edited to fix a problem with the original streamstats I wrote

index=main* build 
| streamstats last(project) as lastproject current=f window=1 by build
| eval project = if(notnull(project),project,lastproject)
| where status=="FAIL"
| stats count as status by build, area, project

jalau9
Explorer

Yes, you're right, I edited my question for consistency. When I tried this, I could not get a statistics of stats count, but I could see the events filtered to those where status=="FAIL". However in these events, the project is blank as it wasn't appended. How can I append project to these events?

0 Karma

lguinn2
Legend

I didn't do the streamstats command properly - I edited my answer above and hope this will fix it!
Let me know if it does not work!

0 Karma

jalau9
Explorer

I edited 'notnull' to 'isnotnull'. It's the same. I can see the events up to

index=main* build 
 | streamstats last(project) as lastproject current=f window=1 by build
 | eval project = if(isnotnull(project),project,lastproject)
 | where status=="FAIL"

but when I view the search results at this query level, project field is empty. When I append | stats count as status by build, area, project to the end, no results come out unless I remove project.

Another thing I don't understand about this query is that when I do

index=main* build 
     | streamstats last(project) as lastproject current=f window=1 by build | table lastproject

, I get no results. Why is that?

0 Karma

lguinn2
Legend

Crud, something is still not working to propagate the project field across the events. The reason that you get nothing with stats command is that the project does not exist in all events. lastproject should be there, but since it isn't, then something is still probably wrong with the streamstats...

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...