Solved: Efficient way to look for values across many milli...

howyagoin · ‎03-04-2013

Hi,

I've got a sourcetype which has around 100,000 values to a field across 225,000,000 events per day, and another sourcetype which has a total of around 5000 values/events and is static (very little change over the course of a year).

What is the most efficient way to find out IF the second sourcetype has any occurrence in the first, possibly going back 30+ days? I was leaning towards a summary-index based query conducted every few hours, to extract the unique values of the large sourcetype, then check the smaller against that - but even that would take a while.

Looking at the various options, such as "return" and "join" - or others - not sure what is the most efficient.

I don't want all of the values from the larger source that contain the smaller, indeed, I just want a list of the smaller sourcetype values that also occur in the much larger sourcetype.

Thanks!

gkanapathy · ‎03-04-2013

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

View solution in original post

gkanapathy · ‎03-04-2013

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

howyagoin · ‎03-05-2013

Thought so - was doing that, but it's still going to take many hours (days?) to run. Likely I'll have to build a better mousetrap here, as the data is just too vast to do the full 30 days worth of querying I need to.

Efficient way to look for values across many millions (hundreds of, or, billions) of events?

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms