Hi Everyone,
I'm testing a simple setup of a search head on a single 24 core host. The setup basically consists of 1 x forwarder, 1 x search head, 3 x search peers. Following the setup from Splunk seems to work OK. Forwarder is forwarding with a load balance across the peers and I can search via search head.
The issue is that it seems the search performance is far slower versus a single instance test we had. For example, on a single simple search (index = xxx) that yielded say 11 million records, it might take 10 minutes on the search head but will take far less time on the single instance. I was under the impression that search head should yield shorter search times, is it not? Any pointers would be appreciated!
Thank you.
Edit:
- I've tried adding threaded & SSL changes from http://answers.splunk.com/answers/105721/distributed-search-performance.html?utm_source=typeahead&ut... without much changes in search results
- Do I need to do mounted bundles? I don't think that's needed seeing as we're doing all within the same hosts (different instances)
For now, please do not attempt to tune performance settings - this can have terrible effects as long as it's not known whether there actually is anything to tune, and what to tune how.
You've run index=foo
- that requires streaming millions of events off three sets of disks on the indexers through the network to the search head. I'd say for this scenario the network may be the bottleneck.
Instead, compare this search between your single instance and the distributed setup:
index=foo some_field=bar | timechart count by some_other_field
This should make much more efficient use of the map-reduce behind Splunk and represents a more realistic work load. In the real world, streaming millions of events in the face of users is pointless.
For now, please do not attempt to tune performance settings - this can have terrible effects as long as it's not known whether there actually is anything to tune, and what to tune how.
You've run index=foo
- that requires streaming millions of events off three sets of disks on the indexers through the network to the search head. I'd say for this scenario the network may be the bottleneck.
Instead, compare this search between your single instance and the distributed setup:
index=foo some_field=bar | timechart count by some_other_field
This should make much more efficient use of the map-reduce behind Splunk and represents a more realistic work load. In the real world, streaming millions of events in the face of users is pointless.
It's listed as a general troubleshooting approach when network performance is poor at http://docs.splunk.com/Documentation/Splunk/6.2.4/DistSearch/Troubleshootdistributedsearch#Network_p... so it should be fine.
Now your left search is taking basically as long in total as it takes to stream the events from peer #3, so there's little left to be gained apart from beefing up the network. For comparison, run the same search with | stats count
added to the end in both scenarios - that should alleviate the pressure on the network and reduce the gap. Still, you won't see performance gains while only searching one peer's data.
Sorry for not updating this, but Martin has got it right, the search runtime's lack of difference was really due to my oversight of trying to search a single peer rather than across multiple peer (parallelization). Comparing the search time of a search targeting a single peer vs a local search wasn't fair as there was obvious overheard (network/communication).
I see. From the bottom of your screenshot you can tell you're actually only streaming events from one search peer. No parallelization, only added network and communication overhead. This has to be a little bit slower than a local search. Make sure your forwarders are loadbalancing their data properly over all three indexers.
However, the big gap you're seeing likely is finalizeRemoteTimeline. Set this in limits.conf on the SH, restart it, and run the search again:
[search]
remote_timeline_fetchall = false
Earlier I said "don't tune things yet", because we didn't know then what ate up the time. Now we do, and it's time to start tuning 🙂
Thanks Martin. Done, but we're still seeing the same-ish. I agree with you - that was what I was thinking as well. With distributed search, likely the only true benefit I'll see is when I run the search across multiple search peers (parallelization) and searching into one peer will have overhead. I honestly did not expect that huge an overhead though.
Also, are there any concerns with setting remote_timeline_fetchall to false? What're the repercussions? From limits.conf, it doesn't look like a major concern, is that correct?
Thanks again!
To get exact numbers look at the top of the job inspector. Additionally, it'll tell you where that time went down the drain.
Yep, that's what I did. I've did a smaller sample run and below is the result. It doesn't look like a huge delay with the search (the search string is basic), but rather with local vs remote.
Thanks for getting back!
Actually, after typing up the initial question, I ran a search on the SH and on one of the peer (#3) where the total number of results is similar. The runtime differs about a minute or so, is this expected?
The reason we're even attempting the SH is because our working model is a little different where we use Splunk to grab the data into a CSV. A local Splunk consultant informed us that we'd see much better performance if we ran with a search head + multi indexer architecture, which makes sense but with a small basic test, we're already seeing a minute of additional run time.
Let me see if I can get the exact numbers hmm.