Splunk Search

How to join two indexes in different time ranges?

mcohen13
Loves-to-Learn

I have two indexes:
index 1 contains a list of domains and event_timestamp, index 2 contains a description for every domain.
I want to join the two and enrich all domains in index 1 with their description in index 2.
Now when I enrich I want to look at all of the information on index 2 and enrich all domains in the time frame I choose in search

I tried to do that with join:

source="events" | join query.domain  [search index="events_enrich_with_desc" | rename event_domain AS query.domain ] earliest=-10y@y | search ...

But this join doesn't work properly (only 25% are enriched).

What am I missing here? if it can be done in a different way than join than I'm OK with it.

0 Karma
1 Solution

DalJeanis
Legend

1) That earliest= is in a strange place. What are you trying to do there?

Try this...

source="events" 
| join type=left query.domain  [  search index="events_enrich_with_desc" earliest=-10y@y  
    | dedup event_domain
    | rename event_domain AS query.domain ] 
| eval enrichfield=coalesce(enrichfield,"((not found))")
| search ...

That assumes that you want up to 10 years back from your enriched index, and that you only want the first answer for each domain.

View solution in original post

0 Karma

DalJeanis
Legend

1) That earliest= is in a strange place. What are you trying to do there?

Try this...

source="events" 
| join type=left query.domain  [  search index="events_enrich_with_desc" earliest=-10y@y  
    | dedup event_domain
    | rename event_domain AS query.domain ] 
| eval enrichfield=coalesce(enrichfield,"((not found))")
| search ...

That assumes that you want up to 10 years back from your enriched index, and that you only want the first answer for each domain.

0 Karma

mcohen13
Loves-to-Learn

also, after that join i don't see the fields from Index 2
why is that? and how can i add those fields?

0 Karma

DalJeanis
Legend

That join should contain every field that is on either record. If not, please post your exact search language. It might help you to explicitly list the fields you want from each record, then take an example value for query.domain and verify that the fields are present.

 source="events" 
| fields query.domain ... and list all the fields from this one you want ...
 | join type=left query.domain  [  search index="events_enrich_with_desc" earliest=-10y@y  
     | dedup event_domain
     | rename event_domain AS query.domain 
     | fields  query.domain enrichfield    ... and list all the fields from this one you want ...
    ] 
 | eval enrichfield=coalesce(enrichfield,"((not found))")
 | search ...
0 Karma

mcohen13
Loves-to-Learn

OK ill be more specific, may index 2 contain 4 fields:

  1. event_domain
  2. TLD
  3. description
  4. input (where i got the event_domain)

after the join i notice something very interesting, if i search all of domain description than after the:
eval description=coalesce(description,"((not found))")
i get only 3K domains description out of 27K (24L or "not found")
but if i search for a specific domain description wild card: like description="dga"
i get all domains with that description, about 17K domains
why is that?

0 Karma

mcohen13
Loves-to-Learn

it worked, thanks a lot!!!

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...