Is there a way to run multiple searches one after ...

jdunlea · ‎01-28-2015

Is there any way we can run multiple searches one after another, ensuring that the previous search has finished before a new search starts?

I have tried using a script to call the saved searches but it doesn't work.

For example, I want 3 searches to run one after the other, ensuring that the previous search has finished before the new search starts. I set the first one up to be scheduled and to trigger a script every time the search runs. Within this script are 2 curl commands, that call the other two searches.

The problem is that the curl commands "complete" or "finish" returning a success message once the search has been called. It does not take into account whether or not the search actually completed or not. For this reason, both curl commands in the script run very quickly and both searches are kicked off almost simultaneously. I only want search number 3 to kick off once search number two has finished.

Is this possible?

Jon_Webster · ‎01-28-2015

Why do you need to do this?

jdunlea · ‎01-29-2015

I have 4 searches that run overnight all writing/updating to the same lookup file. They are each writing/updating a value to the lookup file for a specific identifier.

Each search is taking between 1.5 and 2 hours to run. The search time varies. I need each search to start ONLY after the previous one has finished so that when the search does its "inputlookup" it is taking the most recent update of the lookup file.

Sure, I could just schedule them far enough apart to ensure that the run times don't collide, but I would like a way to trigger the running of another search. In the case of the long running searches, 4 searches at a rough runtime of 2 hours means a 6 hour window total for all 4 searches (from search 1 kick off time to search 4 kick off time). I want to reduce this as much as possible, hence looking for immediate triggering of the next search.

martin_mueller · ‎01-28-2015

You could query the job returned by your curl through the REST API for completion, and wait while it's still running.

http://docs.splunk.com/Documentation/Splunk/6.2.1/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D

Alternatively, munge your searches into one like this (untested, but subsearches should evaluate sequentially inner-to-outer):

| savedsearch search_three | append [savedsearch search_two | append [savedsearch search_one]]

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/savedsearch

martin_mueller · ‎01-29-2015

Have your script check the job's status, sleep for a minute, repeat (maybe re-login to the API if your session expires). That should work regardless of job duration.

As an entirely different thought, you may want to check if there's a reasonable way to use the 6.2 KV store instead of a traditional lookup file. It seems to me as if your core issue is not overwriting one search's result with old values read by another search hours ago, so basically the whole | inputlookup | do something | outputlookup pattern.
I'd think it might be possible to use KV store's ability to target individual objects in the store for update to avoid this very issue in the first place. Start all jobs, let each update their own object, not worry about competing overwrites... if reality matches my thoughts, that is.

jdunlea · ‎01-30-2015

Unfortunately we have just recently upgraded to 6.1.4 so we will not be moving to 6.2 any time soon. Thank you for the suggestion though!

cpride_splunk · ‎01-30-2015

If you are using the REST API for queries you should also have the "sid". In that case you could use the "loadjob" command with the sid to chain from one job to the next. If you are worried about the previous job expiring before you use it, you can always set the ttl with the REST API as well. http://docs.splunk.com/Documentation/Splunk/6.2.1/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...

So end up with something like:
POST /services/search/jobs "search 1" -> get sid from response
while (/services/search/jobs/search1_sid status != DONE)
sleep 1

POST /services/search/jobs " | loadjob search1_sid | ..." -> get sid from response
...

jdunlea · ‎01-29-2015

I was going to do the subsearches example. In fact I tried it and had them run in reverse order (order doesn't actually matter in my case) but subsearches auto-finalize after 60 seconds. In my case, each search has a run time of upwards of 1.5 hours.

I may do the REST API check. That makes more sense! Would there be any issue in waiting for the completion of the search even if it is going to take 1.5 2 hours to complete? I could always do a periodic check I suppose.

Is there a way to run multiple searches one after another, ensuring the previous search has finished before a new search starts?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!