Splunk Search

How do you get join functionality without using subsearch?

djain
Path Finder

Hey Splunkers,

Here is my original query where the sub search is getting truncated to 50000 records.

index = abc sourcetype=abc_errors 
| rename  device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId
| fields receiverId 
| join receiverId[search index=abc sourcetype=abc_temp|fields receiverId billingId]
| table receiverId billingId

I am trying to write a stats command for it so that I don't have to use join. Here is what I thought might work but doesn't.

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
  | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string
  | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId 
  | dedup receiverId sourcetype
  | stats count AS total by receiverId
  | where total>1
  | table receiverId

Can someone tell me what I might be doing wrong? I know there is something funky about the dedup, but I can't think of anything else right now.

Thanks,
Divyank

Tags (3)
0 Karma
1 Solution

djain
Path Finder

I figured out a way to do it, I took the coalesce idea from @adonio . Thank you for that. Here is the solution query:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
   | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
    | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS  Receiver
    | eval receiver_id = coalesce(Receiver, receiverId ) 
    | dedup receiver_id sourcetype 
    | stats count(sourcetype) AS total BY receiver_id
    | where total>1 
    | stats count(receiver_id) AS match

Thank you everyone for your input

View solution in original post

djain
Path Finder

I figured out a way to do it, I took the coalesce idea from @adonio . Thank you for that. Here is the solution query:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
   | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
    | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS  Receiver
    | eval receiver_id = coalesce(Receiver, receiverId ) 
    | dedup receiver_id sourcetype 
    | stats count(sourcetype) AS total BY receiver_id
    | where total>1 
    | stats count(receiver_id) AS match

Thank you everyone for your input

djain
Path Finder

@DalJeanis I am sorry for the direct tag, but you answered one of these questions for me perfectly so wanted to se if you can help me again

0 Karma

Vijeta
Influencer
index=abc (sourcetype=abc_temp OR sourcetype=abc_errors)| fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId | dedup sourcetype receiverId|stats count(eval(sourcetype="abc_temp")) as temp, count(eval(sourcetype="abc_errors")) as errors by receiverId| where temp=errors
0 Karma

djain
Path Finder

This did not work. It seems like the where and dedup function both are not working

0 Karma

Vijeta
Influencer

Try doing a sort on sourcetype receiverId before dedup. What is the output you are getting using above search, you can test it removing where clause and see the values of temp and errors for each receiverId

0 Karma

adonio
Ultra Champion

try this:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
| fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
| eval receiver_id = coalesce(receiverId, device.headwaters.watermark.core.DeviceInfo.receiverId.string)
| stats count as total by reciver_id
| where total>1 
| table receiver_id

hope it helps

0 Karma

djain
Path Finder

This did not work It wouldn't give any results. Also we are not comparing the receiverId from both sourcetype? So for example if one sourcetype has more than one value for that receiverId it would still show up in the results? We want only the common receiverId between the sourcetypes to show

0 Karma

koshyk
Super Champion

Please try like below

index=abc sourcetype=abc_temp [index = abc sourcetype=abc_errors  | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId | stats count by receiverId| fields receiverId]
|fields receiverId billingId
0 Karma

djain
Path Finder

This approach is using a subsearch? That is the problem that we are facing, subsearch is limited to 50000 rows

0 Karma

koshyk
Super Champion

Please note, i'm doing a stats count by receiverId within second search. So you still expect the unique receiverId to be greater than 50k?

0 Karma

djain
Path Finder

Yeah it would be closer to a million.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...