Symptoms:
It usually happen in the next couple of hours after we manually deleted the stuck search jobs
It only happens to particular searches
Once it happens, how long the affected search is stuck with the status of "finalizing"?
The affected search jobs is stuck with the status of "finalizing" until we manually deleted them.
Job status says 100% completed according to job inspection(when viewed via "Inspet Job")
Solution:
1. bump up max_chunk_queue_size in limits.conf in the search heads.
That will reduce the necessity for pausing search result collation queues, which makes hitting the issue less likely
[search]
result_queue_max_size = 200000000
max_chunk_queue_size = 5000000
fetch_remote_search_log = false
Details about above paramters.
result_queue_max_size =
* The maximum size, in MB, that will be kept from peers for processing on
the search head before throttling the rate that data is accepted.
* The “results_queue_min_size” value takes precedence. The number of search
results chunks specified by “results_queue_min_size” will always be
retained in the queue even if the combined size in MB exceeds the
“result_queue_max_size” value.
* Default: 100
max_chunk_queue_size =
* The maximum size of the chunk queue.
* default: 10000000
Updating above parameters reduced the necessity for pausing search result collation queues, which makes hitting the issue less likely. And this fixed our issue.
Solution:
1. bump up max_chunk_queue_size in limits.conf in the search heads.
That will reduce the necessity for pausing search result collation queues, which makes hitting the issue less likely
[search]
result_queue_max_size = 200000000
max_chunk_queue_size = 5000000
fetch_remote_search_log = false
Details about above paramters.
result_queue_max_size =
* The maximum size, in MB, that will be kept from peers for processing on
the search head before throttling the rate that data is accepted.
* The “results_queue_min_size” value takes precedence. The number of search
results chunks specified by “results_queue_min_size” will always be
retained in the queue even if the combined size in MB exceeds the
“result_queue_max_size” value.
* Default: 100
max_chunk_queue_size =
* The maximum size of the chunk queue.
* default: 10000000
Updating above parameters reduced the necessity for pausing search result collation queues, which makes hitting the issue less likely. And this fixed our issue.
Great question/answer, please accept your answer unless your waiting for alternative answers!