Distributed environment scaling via commodity hard...

deodion · ‎08-03-2016

We have seen the reference on hardware spec for performance and scaling, how about this below:

What is the difference between (lets say):
3 x servers with spec:
12 physical cores, 32GB RAM, 800 IOPS per server

versus

ONE BIG giant virtual machine with spec:
36 physical cores, 96GB RAM, 2400 IOPS

Many thanks,

sdvorak_splunk · ‎08-06-2016

Correct:
https://wiki.splunk.com/Community:HowIndexingWorks

sdvorak_splunk · ‎08-04-2016

Unless, you are dedicating resources on the virtual machine, it is unlikely that will perform as well as 3 smaller physical machines. In fact, you will incur a 10% performance penalty for indexing by simply running virtually (worse if there is contention for CPU, RAM, or disk).
Also, a single indexing pipeline would use 4 CPUs per machine, thus 3 servers would have 12 CPU worth of indexing horsepower. As of 6.3 you can have up to 2 (max recommended) indexing pipelines which would give you 8 CPU worth of indexing horsepower on the "big server", which would be less than the 3 smaller servers, but would leave more CPUs available for handling search. If that meets your needs, it's probably fine.
Personally, if I was setting this up, I would want the 3 servers with known dedicated resources, and the inherent redundancy that is associated with it. But if you are going for ease of use and simplified maintenance/administration, 1 server does fit the bill.

Info on indexing parallelization:
http://docs.splunk.com/Documentation/Splunk/6.3.0/Capacity/Parallelization

sdvorak_splunk · ‎08-04-2016

I should add, if you go with the larger system, your expansion options are to add another identical system. While with 3 smaller servers, you could add a single identical smaller system.

deodion · ‎08-04-2016

Where can I get that reference doc/link: a single indexing pipeline would use 4 CPUs per machine?

I never heard before if one indexer has maximum usability of indexing processing, or perhaps also for searching.

Thanks,

sdvorak_splunk · ‎08-05-2016

I'm not sure there is a reference in the docs to the 4 processor usage for indexing. However, there are four distinct queues and this process for the indexing pipeline. Each of those tends to leverage the better part of a CPU. That is the reason we say that 4 CPUs are used per pipeline.

deodion · ‎08-05-2016

You mean these 4 distinct queues are the one we usually seen in ppt slides explaining about indexing process like typing, parsing, etc (if i can recall...not sure. Will look back)

hardikJsheth · ‎08-04-2016

FIrst one will be better, as we can create cluster and use the power of Splunk replication.

deodion · ‎08-04-2016

Yes that is obvious. I wasnt try to look after in that cluster feature.

Im talking about performance and its overall plus minus of each.

Thanks btw

Distributed environment scaling via commodity hardware or one big giant virtual machine?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life