Splunk Search

Efficient way to look for values across many millions (hundreds of, or, billions) of events?

howyagoin
Contributor

Hi,

I've got a sourcetype which has around 100,000 values to a field across 225,000,000 events per day, and another sourcetype which has a total of around 5000 values/events and is static (very little change over the course of a year).

What is the most efficient way to find out IF the second sourcetype has any occurrence in the first, possibly going back 30+ days? I was leaning towards a summary-index based query conducted every few hours, to extract the unique values of the large sourcetype, then check the smaller against that - but even that would take a while.

Looking at the various options, such as "return" and "join" - or others - not sure what is the most efficient.

I don't want all of the values from the larger source that contain the smaller, indeed, I just want a list of the smaller sourcetype values that also occur in the much larger sourcetype.

Thanks!

Tags (2)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

0 Karma

howyagoin
Contributor

Thought so - was doing that, but it's still going to take many hours (days?) to run. Likely I'll have to build a better mousetrap here, as the data is just too vast to do the full 30 days worth of querying I need to.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...