Splunk Search

Join only returning values from one side of the search!

thisissplunk
Builder

I need to join data from two (or more, ultimately) different sourcetypes based on the shared "host" field. Just a subsearch will not achieve what I want here due to the way I want to stats the data out. This join almost works:

index=customer_1 sourcetype=http 
 [search index=customer_1  sourcetype=dns | dedup host | fields host]
| join host [search index=customer_1 sourcetype=dns] 
| stats values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain

It displays the expected data, almost! However, values(sourcetype) is only returning "dns" as a value! Join is not pushing the sourcetype value of "http" into the events, even though all of the other http fields are being joined correctly and displaying in the stats output. I imagine any other fields that share a name would only get the right side's value as well but I haven't tested. What gives?

Side note - I use a subsearch initially on the left side of the join because it would be too much data to feed into the right side without it. I'm not entirely sure what I'm doing is best practice but the way joins work are actually backwards in terms of the way I want to use them here, efficiently speaking (I wish I could flip the left and right queries but that doesn't work).

Anyways, anyone know what is going on here with the missing values from the left side?

Tags (2)
0 Karma

sundareshr
Legend

Have you tried it without the subsearch and join, like this

index=customer_1 sourcetype=http OR sourcetype=dns
 | stats dc(host) as hosts values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain | where hosts=1

thisissplunk
Builder

So this would work under normal circumstances. However, I need to group the events together if and only if a certain kind of event exists in the first place. If that events exists, it I take out the "host" with a subsearch and then search for that in other places. To confuse the point further,, I'm not really using "host" in my real query. The value could be a host, ip, etc, so I cannot effectively "count by http_domain" because I'll never know what was given to me first.

These logs are coming from an alert system.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...