Solved: How do you get join functionality without using su...

djain · ‎10-11-2018

Hey Splunkers,

Here is my original query where the sub search is getting truncated to 50000 records.

index = abc sourcetype=abc_errors 
| rename  device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId
| fields receiverId 
| join receiverId[search index=abc sourcetype=abc_temp|fields receiverId billingId]
| table receiverId billingId

I am trying to write a stats command for it so that I don't have to use join. Here is what I thought might work but doesn't.

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
  | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string
  | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId 
  | dedup receiverId sourcetype
  | stats count AS total by receiverId
  | where total>1
  | table receiverId

Can someone tell me what I might be doing wrong? I know there is something funky about the dedup, but I can't think of anything else right now.

Thanks,
Divyank

djain · ‎10-12-2018

I figured out a way to do it, I took the coalesce idea from @adonio . Thank you for that. Here is the solution query:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
   | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
    | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS  Receiver
    | eval receiver_id = coalesce(Receiver, receiverId ) 
    | dedup receiver_id sourcetype 
    | stats count(sourcetype) AS total BY receiver_id
    | where total>1 
    | stats count(receiver_id) AS match

Thank you everyone for your input

View solution in original post

djain · ‎10-12-2018

I figured out a way to do it, I took the coalesce idea from @adonio . Thank you for that. Here is the solution query:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
   | fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
    | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS  Receiver
    | eval receiver_id = coalesce(Receiver, receiverId ) 
    | dedup receiver_id sourcetype 
    | stats count(sourcetype) AS total BY receiver_id
    | where total>1 
    | stats count(receiver_id) AS match

Thank you everyone for your input

djain · ‎10-11-2018

@DalJeanis I am sorry for the direct tag, but you answered one of these questions for me perfectly so wanted to se if you can help me again

Vijeta · ‎10-11-2018

index=abc (sourcetype=abc_temp OR sourcetype=abc_errors)| fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId | dedup sourcetype receiverId|stats count(eval(sourcetype="abc_temp")) as temp, count(eval(sourcetype="abc_errors")) as errors by receiverId| where temp=errors

djain · ‎10-11-2018

This did not work. It seems like the where and dedup function both are not working

Vijeta · ‎10-11-2018

Try doing a sort on sourcetype receiverId before dedup. What is the output you are getting using above search, you can test it removing where clause and see the values of temp and errors for each receiverId

adonio · ‎10-11-2018

try this:

index = abc (sourcetype=abc_errors OR sourcetype=abc_temp)
| fields sourcetype receiverId billingId device.headwaters.watermark.core.DeviceInfo.receiverId.string 
| eval receiver_id = coalesce(receiverId, device.headwaters.watermark.core.DeviceInfo.receiverId.string)
| stats count as total by reciver_id
| where total>1 
| table receiver_id

hope it helps

djain · ‎10-11-2018

This did not work It wouldn't give any results. Also we are not comparing the receiverId from both sourcetype? So for example if one sourcetype has more than one value for that receiverId it would still show up in the results? We want only the common receiverId between the sourcetypes to show

koshyk · ‎10-11-2018

Please try like below

index=abc sourcetype=abc_temp [index = abc sourcetype=abc_errors  | rename device.headwaters.watermark.core.DeviceInfo.receiverId.string AS receiverId | stats count by receiverId| fields receiverId]
|fields receiverId billingId

djain · ‎10-11-2018

This approach is using a subsearch? That is the problem that we are facing, subsearch is limited to 50000 rows

koshyk · ‎10-12-2018

Please note, i'm doing a stats count by receiverId within second search. So you still expect the unique receiverId to be greater than 50k?

djain · ‎10-12-2018

Yeah it would be closer to a million.

How do you get join functionality without using subsearch?

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor