Solved: Performing a join against the saved results of a s...

rtadams89 · ‎12-20-2011

I have been using a complex search query (it's difficult for me to post it here without exposing internal information I don't want to expose) that performs a join on a subsearch. The subsearch is looking at a lot of data, and therefor can take some time to run. I regularly get "auto-finalized after time limit reached" errors on the subsearch part of the query. I've looked into fixing this by adjusting the limits.conf or using a lookup table instead, but would prefer a solution that doesn't require either of those two.

I thought I could replace the join type=outer joinOn [search my sub search string] with join type=outer joinOn [savedsearch subsearch1] and then move what was previously the "my sub search string" to a saved search named "subsearch1". This does work, but I still get the auto-finalized error. I tried setting the "subsearch1" saved search to run every 15 minutes, and cache the results for an hour, but when I run the main outer search, it still seems to be running the join subsearch live (instead of using the cached results from the last time the subsearch1 saved search ran) as I still get an auto-finalized error.

Is there no way to have the join operation performed against the most recent results from the scheduled saved search?

rtadams89 · ‎12-20-2011

The solution seems to be to use | join type=outer joinOn [loadjob savedsearch="myuserid:search:subsearch1"] |. However, to get this to work I had to do two things.

The subsearch1 saved search had to run as a result of the schedule set for it. Manually running the job would show the results of that run in the "View Recent" list for the search, but the main search could not use those results.
The subsearch1 saved search had to have its permission set to be "Read" for all. I'm not sure why this is, as I am running all the related searches under my user context, but I would receive an error about not being able to find the job when the saved search was set to private.

The original question/problem I had has been solved by this, but if anyone knows why 1 and 2 above apply, I would be interested in feedback.

View solution in original post

rtadams89 · ‎12-20-2011

The solution seems to be to use | join type=outer joinOn [loadjob savedsearch="myuserid:search:subsearch1"] |. However, to get this to work I had to do two things.

The subsearch1 saved search had to run as a result of the schedule set for it. Manually running the job would show the results of that run in the "View Recent" list for the search, but the main search could not use those results.
The subsearch1 saved search had to have its permission set to be "Read" for all. I'm not sure why this is, as I am running all the related searches under my user context, but I would receive an error about not being able to find the job when the saved search was set to private.

The original question/problem I had has been solved by this, but if anyone knows why 1 and 2 above apply, I would be interested in feedback.

the_wolverine · ‎09-06-2013

This was very helpful for my situation. Thank you!

rtadams89 · ‎12-20-2011

Yes. The subsearch is basically querying the index filled by the Splunk Active Directory monitor app. As such, creating a summary index would result in roughly the same number of events (though potentially a bit less data per event) in that summary index.

And at any rate, I would really like to know if it is possible to run a join on the cached results from a scheduled search, as this seems like it could be beneficial in many other scenarios too.

kristian_kolb · ‎12-20-2011

Have you ruled out using summary indexes?
/k

Performing a join against the saved results of a scheduled saved search

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?

Updated Data Management and AWS GDI Inventory in Splunk Observability