I'm experiencing quite slow executions of host:8000/custom/SA-ThreatIntelligence/notable_events/update_status when editing a notable event through the Incident Review view. Here's a Gantt chart of a slow-feeling execution:
Built based on notable_event_update_rest_handler.log
and reviewstatuses_rest_handler.log
. The biggest offenders are getCorrelationSearches, commentLengthRequired, isUrgencyOverrideAllowed, getStatusLabelMap and getDefaultStatus, the others are just calling those and inheriting their slowness.
Those all sound as if they're basically loading configuration - any hint on speeding that up or further debugging?
Splunk 6.1.4 on Windows 2008R2, running ES 3.1.1.
You are correct, the redundant calls should not be performed. Look to a future maintenance release of ES to improve performance (we actually have some other things we are looking into that should improve performance).
3.2.1 released a few days ago still uses this extremely slow import. A minute of fiddling with a few .py files after updating takes https://localhost:8000/en-US/custom/SA-ThreatIntelligence/notable_info/all from 13.2s to 10s, for example. Still terrible, but it was 32% more terrible before the fiddling...
Also, it seems there are slow calls being made from the main UI thread synchronously, as complained about by my browser:
"Progress" is all I wanted to hear, thanks 🙂
Hey Luke,
Any news about that maintenance release?
I can't be very specific but I can tell you that it should be soon. I actually reviewed the change-list for this particular problem myself. Sorry for having to be cagey about the dates.
An example of optimization potential, the Incident Review view loads a list of Notable Event owners for the filter in the top left corner via a search... and calls a custom .py endpoint to load a list of Notable Event owners for the log review popup thingy. That'll also help declutter code redundancy... and make customization of that list easier.
Looking forward as hard as my eyes can take 😉
While you're at it, are there plans to reduce the looking-for-summaries overhead that occurs when launching a search? Apparently all saved searches visible in the current context are examined, taking 30ms-ish per saved search on a testing instance. With hundreds of globally visible saved searches in ES, that adds up to several seconds of overhead for every search. Here's a pastebin, look around line 146ish: http://pastebin.com/hg3AtJCe
It doesn't explicitly say "going through all visible saved searches", but I've reproduced that with a naked Splunk. No saved searches, 0.1s for | stats count
- 75 saved searches, 0.5s - 150 saved searches, 1.0s. Turning them app-visible and running the search in a different app behaves as if the saved searches didn't exist, so the globally visible ones are the issue.
I fear making that faster underneath is a core Splunk issue rather than ES, so in the meantime - what would happen if one were to make the saved searches underneath a correlation search app-visible only? Didn't get around to test that on a different machine yet 😞
Replacing this
from splunk.appserver.mrsparkle.lib.util import make_splunkhome_path
with this
#from splunk.appserver.mrsparkle.lib.util import make_splunkhome_path
def make_splunkhome_path(args):
return os.path.join(os.getenv("SPLUNK_HOME"), *args)
(and importing os where necessary) in a dozen or two .py files in SA-ThreatIntelligence/bin
has shaved eight seconds off the REST call for saving notable events.
Update: As documented in #splunk, I've dug a little deeper using commentLengthRequired as my guinea pig.
Underneath it seems to be calling | rest /services/alerts/log_review/comment
(not through the search command of course), which is taking up the biggest chunk of the time. I've replicated this on my laptop with just the SA-ThreatIntelligence app enabled to turn off any other noise that might add time... still takes ages for such a simple config call.
Digging deeper into log_review_rest_handler.py
, apparently two seconds each are spent doing who knows what before entering either the setup
or handleList
method:
127.0.0.1 - admin [06/Nov/2014:13:12:17.666 +0100] "GET /services/alerts/log_review/comment?count=0 HTTP/1.0" 200 4172 - - - 7813ms
2014-11-06 13:12:19,348 DEBUG In setup
2014-11-06 13:12:19,348 DEBUG Out setup
2014-11-06 13:12:21,598 DEBUG In handleList
You can see I'm calling the REST endpoint at 13:12:17, setup
gets called two seconds later and finishes in no time. Another two seconds later handleList
gets called - those four seconds are spent somewhere in between the various layers of Splunkiness, and happen on several of these custom REST calls. In total, they add up to most of the time I'm seeing the notable_events/update_status
consume.