Splunk Search

Efficient way to look for values across many millions (hundreds of, or, billions) of events?

howyagoin
Contributor

Hi,

I've got a sourcetype which has around 100,000 values to a field across 225,000,000 events per day, and another sourcetype which has a total of around 5000 values/events and is static (very little change over the course of a year).

What is the most efficient way to find out IF the second sourcetype has any occurrence in the first, possibly going back 30+ days? I was leaning towards a summary-index based query conducted every few hours, to extract the unique values of the large sourcetype, then check the smaller against that - but even that would take a while.

Looking at the various options, such as "return" and "join" - or others - not sure what is the most efficient.

I don't want all of the values from the larger source that contain the smaller, indeed, I just want a list of the smaller sourcetype values that also occur in the much larger sourcetype.

Thanks!

Tags (2)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

0 Karma

howyagoin
Contributor

Thought so - was doing that, but it's still going to take many hours (days?) to run. Likely I'll have to build a better mousetrap here, as the data is just too vast to do the full 30 days worth of querying I need to.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...