Splunk Search

How do you get join functionality without using subsearch?

djain
Path Finder

Hey Splunkers,

Here is my original query where the sub search is getting truncated to 50000 records.

index = abc sourcetype=abc_errors 
| rename  device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId
| fields receiverId 
| join receiverId[search index=abc sourcetype=abc_temp|fields receiverId billingId]
| table receiverId billingId

I am trying to write a stats command for it so that I don't have to use join. Here is what I thought might work but doesn't.

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
  | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string
  | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId 
  | dedup receiverId sourcetype
  | stats count AS total by receiverId
  | where total>1
  | table receiverId

Can someone tell me what I might be doing wrong? I know there is something funky about the dedup, but I can't think of anything else right now.

Thanks,
Divyank

Tags (3)
0 Karma
1 Solution

djain
Path Finder

I figured out a way to do it, I took the coalesce idea from @adonio . Thank you for that. Here is the solution query:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
   | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
    | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS  Receiver
    | eval receiver_id = coalesce(Receiver, receiverId ) 
    | dedup receiver_id sourcetype 
    | stats count(sourcetype) AS total BY receiver_id
    | where total>1 
    | stats count(receiver_id) AS match

Thank you everyone for your input

View solution in original post

djain
Path Finder

I figured out a way to do it, I took the coalesce idea from @adonio . Thank you for that. Here is the solution query:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
   | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
    | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS  Receiver
    | eval receiver_id = coalesce(Receiver, receiverId ) 
    | dedup receiver_id sourcetype 
    | stats count(sourcetype) AS total BY receiver_id
    | where total>1 
    | stats count(receiver_id) AS match

Thank you everyone for your input

djain
Path Finder

@DalJeanis I am sorry for the direct tag, but you answered one of these questions for me perfectly so wanted to se if you can help me again

0 Karma

Vijeta
Influencer
index=abc (sourcetype=abc_temp OR sourcetype=abc_errors)| fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId | dedup sourcetype receiverId|stats count(eval(sourcetype="abc_temp")) as temp, count(eval(sourcetype="abc_errors")) as errors by receiverId| where temp=errors
0 Karma

djain
Path Finder

This did not work. It seems like the where and dedup function both are not working

0 Karma

Vijeta
Influencer

Try doing a sort on sourcetype receiverId before dedup. What is the output you are getting using above search, you can test it removing where clause and see the values of temp and errors for each receiverId

0 Karma

adonio
Ultra Champion

try this:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
| fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
| eval receiver_id = coalesce(receiverId, device.headwaters.watermark.core.DeviceInfo.receiverId.string)
| stats count as total by reciver_id
| where total>1 
| table receiver_id

hope it helps

0 Karma

djain
Path Finder

This did not work It wouldn't give any results. Also we are not comparing the receiverId from both sourcetype? So for example if one sourcetype has more than one value for that receiverId it would still show up in the results? We want only the common receiverId between the sourcetypes to show

0 Karma

koshyk
Super Champion

Please try like below

index=abc sourcetype=abc_temp [index = abc sourcetype=abc_errors  | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId | stats count by receiverId| fields receiverId]
|fields receiverId billingId
0 Karma

djain
Path Finder

This approach is using a subsearch? That is the problem that we are facing, subsearch is limited to 50000 rows

0 Karma

koshyk
Super Champion

Please note, i'm doing a stats count by receiverId within second search. So you still expect the unique receiverId to be greater than 50k?

0 Karma

djain
Path Finder

Yeah it would be closer to a million.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...