Hi there! We have an environment of a single search head with 14 peers, and it seems like our distributed searches take much longer than they should. A given search takes around ~15-20 seconds to complete, regardless of whether the search is scoped to last minute or last week.
What's really interesting is that from the search head, if I specify a specific indexer by using "splunk_server= ", the search completes in almost exactly the same time as if I run the search locally on that indexer. However, as soon as I add even a second indexer to the search, the search time goes up disproportionately (for most examples, from <1s to >5s).
What gives? I find it odd that distributing to even just 2 peers takes much longer than the sum of the individual searches on those peers. Are there any performance / tuning parameters we should be looking at? Our environment is almost completely vanilla Splunk and the knowledge bundle we distribute is ~500K, so I'm not sure what else we need to look at.
Any input? Thanks for the help!
EDIT
Yep, no problem.
It looks like for any given search, we spend ~8-10 seconds in dispatch.createProviderQueue. This is the single constant that's jumping out at me across searches, regardless of how big or small the search is.
I checked the Splunk manual and this value is the time it takes to connect to each search peer? Our highest latency peers are ~150ms away so I'm not sure how this value is so high.
Let me know if you need anything else from me. Thanks again!
Duration (seconds) Component Invocations Input count Output count
0.039 command.fields 51 7,972 7,972
0.43 command.remotetl 51 7,972 -
0.784 command.search 51 - 7,972
0.165 command.search.calcfields 36 8,007 8,007
0.043 command.search.index 51 - -
0.036 command.search.fieldalias 36 8,007 8,007
0.036 command.search.filter 36 - -
0 command.search.index.usec_1_8 63 - -
0 command.search.index.usec_8_64 16 - -
0.274 command.search.kv 36 - -
0.199 command.search.rawdata 36 - -
0.139 command.search.typer 51 7,972 7,972
0.036 command.search.lookups 36 8,007 8,007
0.034 command.search.tags 51 7,972 7,972
0.018 command.search.summary 51 - -
0.001 dispatch.check_disk_usage 1 - -
8.482 dispatch.createProviderQueue 1 - -
0.043 dispatch.evaluate 1 - -
0.042 dispatch.evaluate.search 1 - -
1.057 dispatch.fetch 66 - -
0.001 dispatch.preview 1 - -
0.475 dispatch.process_remote_timeline 25 1,347,453 -
5.116 dispatch.readEventsInResults 1 - -
0.118 dispatch.remote_timeline_fullevents 25 1,401,267 1,417
0.078 dispatch.stream.local 4 - -
1.11 dispatch.stream.remote 47 - 2,589,235
0.179 dispatch.stream.remote.TTNET-CH-SPIND-1.ttnet.local 4 - 280,901
0.145 dispatch.stream.remote.TTNET-NY-SPIND-2.ttnet.local 4 - 284,779
0.134 dispatch.stream.remote.TTNET-NY-SPIND-1.ttnet.local 4 - 239,333
0.119 dispatch.stream.remote.TTNET-DE-SPIND-1.ttnet.local 3 - 172,445
0.115 dispatch.stream.remote.TTNET-UK-SPIND-2.ttnet.local 4 - 237,858
0.093 dispatch.stream.remote.TTNET-UK-SPIND-1.ttnet.local 4 - 278,894
0.086 dispatch.stream.remote.TTNET-JP-SPIND-1.ttnet.local 4 - 199,469
0.048 dispatch.stream.remote.TTNET-SY-SPIND-1.ttnet.local 3 - 176,112
0.047 dispatch.stream.remote.TTNET-JP-SPIND-2.ttnet.local 3 - 127,786
0.037 dispatch.stream.remote.TTNET-SG-SPIND-1.ttnet.local 3 - 157,387
0.036 dispatch.stream.remote.TTNET-SG-SPIND-2.ttnet.local 3 - 173,048
0.031 dispatch.stream.remote.TTNET-DE-SPIND-2.ttnet.local 3 - 107,137
0.029 dispatch.stream.remote.TTNET-SY-SPIND-2.ttnet.local 3 - 137,724
0.011 dispatch.stream.remote.TTNET-CH-SPIND-2.ttnet.local 2 - 16,362
0.305 dispatch.timeline 66 - -
0.019 dispatch.writeStatus 7 - -
0.042 startup.handoff 1 - -
... View more