After upgrading to v8.0.1 we noticed that many of our long-running scheduled searches are ending up in a "Finalized" state, instead of a "Done" state. We also suspect that our results are now incomplete. What is happening?
New in 8.* (maybe in later v7 releases, too) is the workload management
feature:
https://docs.splunk.com/Documentation/Splunk/8.0.1/Workloads/Keyconcepts
I suspect that we have been assigned (possibly by default) greater constraints on our searches than we had before the upgrade and this is crippling our long-running searches.
We looked at a search that should be writing 100K rows to a lookup and found that now we see in the results are only 325 rows. Further inspection of the search.log yields this critical tidbit:
• info : Search auto-finalized after disk usage limit (400MB) reached.
• info : Search finalized.
So we need to do several things to fix this.
First, change the way that we work: for any search of substantial size, be sure to check the job inspector before trusting the results. If you are unsure or need to look back in time, you can go to “Activity” -> “Jobs” and look for any search that says Finalized
instead of Done
. Good searches show Done
; compromised/bad searches show Finalized”
From now on, we must be diligent to watch for and NEVER use/trust any Finalized
search, because it ABSOLUTELY has partial results.
Second, we must no longer allow ANY scheduled searches to be run by any regular user. As soon as it is “good to go”, we need to let the Splunk admin know so that he can move the ownership to nobody
or a system account
so that they run with more liberal constraints.
New in 8.* (maybe in later v7 releases, too) is the workload management
feature:
https://docs.splunk.com/Documentation/Splunk/8.0.1/Workloads/Keyconcepts
I suspect that we have been assigned (possibly by default) greater constraints on our searches than we had before the upgrade and this is crippling our long-running searches.
We looked at a search that should be writing 100K rows to a lookup and found that now we see in the results are only 325 rows. Further inspection of the search.log yields this critical tidbit:
• info : Search auto-finalized after disk usage limit (400MB) reached.
• info : Search finalized.
So we need to do several things to fix this.
First, change the way that we work: for any search of substantial size, be sure to check the job inspector before trusting the results. If you are unsure or need to look back in time, you can go to “Activity” -> “Jobs” and look for any search that says Finalized
instead of Done
. Good searches show Done
; compromised/bad searches show Finalized”
From now on, we must be diligent to watch for and NEVER use/trust any Finalized
search, because it ABSOLUTELY has partial results.
Second, we must no longer allow ANY scheduled searches to be run by any regular user. As soon as it is “good to go”, we need to let the Splunk admin know so that he can move the ownership to nobody
or a system account
so that they run with more liberal constraints.
Ouch. That definitely puts a crimp in our upgrade plans. We allow regular users to have liberal constraints now and create/run/modify scheduled searches in their apps. Adding constraints to them or moving so they couldn't update them adds some additional operational processes that have not been accounted for.
Thanks @woodcock
This should not be a problem. Just be sure to check out the feature's defaults BEFORE you upgrade and make sure that you set them appropriately.