We have seen the reference on hardware spec for performance and scaling, how about this below:
What is the difference between (lets say):
3 x servers with spec:
12 physical cores, 32GB RAM, 800 IOPS per server
versus
ONE BIG giant virtual machine with spec:
36 physical cores, 96GB RAM, 2400 IOPS
Many thanks,
Unless, you are dedicating resources on the virtual machine, it is unlikely that will perform as well as 3 smaller physical machines. In fact, you will incur a 10% performance penalty for indexing by simply running virtually (worse if there is contention for CPU, RAM, or disk).
Also, a single indexing pipeline would use 4 CPUs per machine, thus 3 servers would have 12 CPU worth of indexing horsepower. As of 6.3 you can have up to 2 (max recommended) indexing pipelines which would give you 8 CPU worth of indexing horsepower on the "big server", which would be less than the 3 smaller servers, but would leave more CPUs available for handling search. If that meets your needs, it's probably fine.
Personally, if I was setting this up, I would want the 3 servers with known dedicated resources, and the inherent redundancy that is associated with it. But if you are going for ease of use and simplified maintenance/administration, 1 server does fit the bill.
Info on indexing parallelization:
http://docs.splunk.com/Documentation/Splunk/6.3.0/Capacity/Parallelization
I should add, if you go with the larger system, your expansion options are to add another identical system. While with 3 smaller servers, you could add a single identical smaller system.
Where can I get that reference doc/link: a single indexing pipeline would use 4 CPUs per machine?
I never heard before if one indexer has maximum usability of indexing processing, or perhaps also for searching.
Thanks,
I'm not sure there is a reference in the docs to the 4 processor usage for indexing. However, there are four distinct queues and this process for the indexing pipeline. Each of those tends to leverage the better part of a CPU. That is the reason we say that 4 CPUs are used per pipeline.
You mean these 4 distinct queues are the one we usually seen in ppt slides explaining about indexing process like typing, parsing, etc (if i can recall...not sure. Will look back)
FIrst one will be better, as we can create cluster and use the power of Splunk replication.
Yes that is obvious. I wasnt try to look after in that cluster feature.
Im talking about performance and its overall plus minus of each.
Thanks btw