Splunk Search

How do I bring results from my subsearch into my outer search's table? (Subsearch in outer search of join.)

thisissplunk
Builder

Looked at join and append. Tried both, couldn't get them working. I need your eyes to help me here!

This is my current search where I'd like to actually hold onto some of the subsearch's data to toss them into the table in the outer search to add context. Outer search has hosts and the hashes that were seen on them, and the subsearch sourcetype "fileinfo" has the juicy file data I want for context. Each index is a different work site, full of sourcetypes that are also in every other index/site (but with their own unique data/events):

[search index=site_* sourcetype=fileinfo | dedup hash | fields hash index] 
| stats values(host) by index host hash

What I really want to do is simply put fields from the fileinfo sourctype in the inner search onto the end of the stats section. I cannot figure out how to do it though. When I try an inner join, it returns way too many events that don't even have the same fields that I'm trying to match on. I want to join data right?

Example of current output:

host        hash                           timesseen
abc1host    12389hu4t223eg732327gfr2367    12-25-2015

Example of what I want:

host        hash                           timesseen     fileinfo_mime_type   file_info_date_created
abc1host    12389hu4t223eg732327gfr2367    12-25-2015    gif                  03-01-2009

Anyone?

1 Solution

thisissplunk
Builder

Thanks for the answers. I did end up figuring it out myself:

Orignal:

 index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif"  | dedup hash | fields hash index] 
 | stats values(host) by index host hash

Working one with join:

 index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif"  | dedup hash | fields hash index] 
 | join hash [search index=*_fn sourcetype=fileinfo] 
 | stats count by index host hash fileinfo_mime_type fileinfo_date_created

The trick was understanding that my initial subsearch is part of the "outer search" of the join. The "inner search" is the subsearch after the join command. So yeah, two subsearches made it tricky.

So yeah - what I'm doing is asking "give me every hash that is a gif via the fileinfo sourcetype, now tell me if any of those hashes have been seen on our hosts via our host_hashes sourcetype, then finally append useful data right back from fileinfo on the results.

View solution in original post

thisissplunk
Builder

Thanks for the answers. I did end up figuring it out myself:

Orignal:

 index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif"  | dedup hash | fields hash index] 
 | stats values(host) by index host hash

Working one with join:

 index=site_* sourcetype=host_hashes [search index=site_* sourcetype=fileinfo mime_type="gif"  | dedup hash | fields hash index] 
 | join hash [search index=*_fn sourcetype=fileinfo] 
 | stats count by index host hash fileinfo_mime_type fileinfo_date_created

The trick was understanding that my initial subsearch is part of the "outer search" of the join. The "inner search" is the subsearch after the join command. So yeah, two subsearches made it tricky.

So yeah - what I'm doing is asking "give me every hash that is a gif via the fileinfo sourcetype, now tell me if any of those hashes have been seen on our hosts via our host_hashes sourcetype, then finally append useful data right back from fileinfo on the results.

somesoni2
SplunkTrust
SplunkTrust

You may want to throw a stats in the 2nd subsearch (join). This will improve your search performance. Something like this

... | join hash [search index=*_fn sourcetype=fileinfo | stats count by hash fileinfo_mime_type fileinfo_date_created]  ....
0 Karma

dcarmack_splunk
Splunk Employee
Splunk Employee

The information I have to go on is limited, but I think this will work.

index=site_* (sourcetype=hashes_on_hosts OR sourcetype=fileinfo) | stats values(fileinfo_mime_type) AS fileinfo_mime_type values(file_info_date_created) AS file_info_date_created by host hash

thisissplunk
Builder

Thanks! I tried that but it didn't work. I need to filter on mime_type, but if I do that then I lose all of the events from the hashes_hosts sourcetype since they don't have a mime_type, losing all of the context.

That said, I found my answer.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Can you provide the query that is generating your "Current output" in the question? How many hashes are available in both sourcetypes?

0 Karma

thisissplunk
Builder

The current output is produced by the query above it. Well, I rearranged what's in stats, but it's the same besides that.

0 Karma

thisissplunk
Builder

My join search that isn't working is (bring back tons of events without a hash field in them):

index=site_* sourcetype=hashes_on_hosts | join hash [search index=site_* sourcetype=fileinfo]
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...