Splunk Search

Join only returning values from one side of the search!

thisissplunk
Builder

I need to join data from two (or more, ultimately) different sourcetypes based on the shared "host" field. Just a subsearch will not achieve what I want here due to the way I want to stats the data out. This join almost works:

index=customer_1 sourcetype=http 
 [search index=customer_1  sourcetype=dns | dedup host | fields host]
| join host [search index=customer_1 sourcetype=dns] 
| stats values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain

It displays the expected data, almost! However, values(sourcetype) is only returning "dns" as a value! Join is not pushing the sourcetype value of "http" into the events, even though all of the other http fields are being joined correctly and displaying in the stats output. I imagine any other fields that share a name would only get the right side's value as well but I haven't tested. What gives?

Side note - I use a subsearch initially on the left side of the join because it would be too much data to feed into the right side without it. I'm not entirely sure what I'm doing is best practice but the way joins work are actually backwards in terms of the way I want to use them here, efficiently speaking (I wish I could flip the left and right queries but that doesn't work).

Anyways, anyone know what is going on here with the missing values from the left side?

Tags (2)
0 Karma

sundareshr
Legend

Have you tried it without the subsearch and join, like this

index=customer_1 sourcetype=http OR sourcetype=dns
 | stats dc(host) as hosts values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain | where hosts=1

thisissplunk
Builder

So this would work under normal circumstances. However, I need to group the events together if and only if a certain kind of event exists in the first place. If that events exists, it I take out the "host" with a subsearch and then search for that in other places. To confuse the point further,, I'm not really using "host" in my real query. The value could be a host, ip, etc, so I cannot effectively "count by http_domain" because I'll never know what was given to me first.

These logs are coming from an alert system.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...