Splunk Search

Join only returning values from one side of the search!

thisissplunk
Builder

I need to join data from two (or more, ultimately) different sourcetypes based on the shared "host" field. Just a subsearch will not achieve what I want here due to the way I want to stats the data out. This join almost works:

index=customer_1 sourcetype=http 
 [search index=customer_1  sourcetype=dns | dedup host | fields host]
| join host [search index=customer_1 sourcetype=dns] 
| stats values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain

It displays the expected data, almost! However, values(sourcetype) is only returning "dns" as a value! Join is not pushing the sourcetype value of "http" into the events, even though all of the other http fields are being joined correctly and displaying in the stats output. I imagine any other fields that share a name would only get the right side's value as well but I haven't tested. What gives?

Side note - I use a subsearch initially on the left side of the join because it would be too much data to feed into the right side without it. I'm not entirely sure what I'm doing is best practice but the way joins work are actually backwards in terms of the way I want to use them here, efficiently speaking (I wish I could flip the left and right queries but that doesn't work).

Anyways, anyone know what is going on here with the missing values from the left side?

Tags (2)
0 Karma

sundareshr
Legend

Have you tried it without the subsearch and join, like this

index=customer_1 sourcetype=http OR sourcetype=dns
 | stats dc(host) as hosts values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain | where hosts=1

thisissplunk
Builder

So this would work under normal circumstances. However, I need to group the events together if and only if a certain kind of event exists in the first place. If that events exists, it I take out the "host" with a subsearch and then search for that in other places. To confuse the point further,, I'm not really using "host" in my real query. The value could be a host, ip, etc, so I cannot effectively "count by http_domain" because I'll never know what was given to me first.

These logs are coming from an alert system.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...