Splunk Search

Looking for way to explain why subsearches are so slow

bcronrath
Path Finder

I've seen it suggested before and definitely have witnessed myself that for searches involving any significant amount of data, it's always light years faster to grab all the data and then figure out a way to correlate it at a later time via stats, versus using a subsearch in your base query. To illustrate what I mean, say for example you have two sourcetypes "left" and "right", each containing their own set of data that has a shared unique identifier that can correlate the data we'll call "unique_id". So why does a search like this:

index=left sourcetype=left [search index=right sourcetype=right | stats count by unique_id| fields unique_id] | stats count

Take massively longer (and in a lot of cases just timeout indefinitely due to memory limits of a subsearch being exceeded) than something like this:

(index=left sourcetype=left) OR (index=right sourcetype=right) 
| eval left_count=if(sourcetype=left,1,0)
| stats values(sourcetype) as sourcetypes, sum(left_count) as left_count by unique_id
| search sourcetypes=*left* sourcetypes=*right*
| stats sum(left_count) as count

I'm wondering why subsearch is always so much slower for something like this?

Tags (1)
0 Karma

koshyk
Super Champion

the sub-search in your first example, is fast when the output rows in your subsearch is less or low in numbers (eg less than 100)
In our testing, the parsing of the sub-search also takes time as Splunk takes time to "get all the values" before proceeding further.

The subsearch is in square brackets and is run first. You also need to see the "expanded search" in the "Job" tab to see how the results are then passed as key-value fields to the outer search (You can see how complex sub-search is !!). This becomes too time consuming when the results of subsearch exceed 1000 rows.

also have a look into "Sub-search Limitations" (eg 10K results) and "Performance Considerations"
https://docs.splunk.com/Documentation/Splunk/8.0.2/Search/Aboutsubsearches

The 2nd search is like a normal Splunk search whereby it can query the data in parallel and highly efficient.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...