I am considering building out a Docker Swarm cluster. For the purpose of keeping things both simple and relatively fault-tolerant, I thought about simply running 3 nodes as managers. What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious? I found this Github issue which asks a similar question, but the answer is a bit ambiguous to me. It mentions the performance may be worse. It also mentions that it will take longer to reach consensus. In practice, what functionality would be slower? And what does "take longer to reach consensus" actually affect?

TL;DR pros and cons of all managers as workers in Swarm: Pros: <ul> <li>Prod-quality HA with only 3 or 5 servers</li> <li>Simplicity of design/management</li> <li>Still secure by default (secrets are encrypted on disk, mutual TLS auth and network encryption on control plane)</li> <li>Any node can administrate the Swarm</li> </ul> Cons: <ul> <li>Requires tighter management of resources to prevent manager starvation</li> <li>Lower secure posture, secrets/keys stored on apps servers</li> <li>Compromised node means the whole Swarm could easily be compromised</li> <li>Limited to odd number of servers, usually 3 or 5</li> </ul> Full Answers to Your Questions <blockquote> What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious? </blockquote> There are no hard requirements for using worker-only nodes. If you're deploying a solution where you know what resources you need, and the number of services/tasks are usually the same, there's nothing wrong with a Swarm of just three managers doing all the work, as long as you have considered these three areas that are affected: <ol> <li> Security. In a perfect world, your managers would not be internet accessible and would only be on a backend subnet, doing only manager work. The managers have all the authority for the Swarm, hold all the encrypted secrets, store the encrypted Raft log, and also (by default) store the encryption keys on disk. Workers only store secrets they need, (and only in memory) and have no authority to do any work in the Swarm other then what they've been told to do by the leader. If a worker gets compromised you haven't "lost the Swarm" necessarily. This separation of powers is not a hard requirement, and many environments accept this risk and just put the managers as the main servers that will publish services to the public. It's just a question of security/complexity vs. cost.</li> <li> Node count. The minimum number of managers for redundancy is 3, and 3 or 5 is what I recommend most of the time. More managers do not equal more capacity, as only one manager is the leader at any time, and the only one to do manager work. The resource capacity of the leader is what determines how much work it can do simultaneously. If your managers are also doing app work, and you need more resource capacity then 3 nodes could handle, then I'd recommend the 4th node and higher are just workers.</li> <li> Performance/scale. Ideally, your managers have all the resources they need to do things fast, like leader election, task scheduling, running and reacting to healthchecks, etc. Their resource utilization will grow the larger the number of total nodes, total services, and rate of new work they have to perform (service/network creation, task changes, node changes, healthchecks, etc.). If you have a small number of servers and small number of services/replicas, then you could likely have the managers also be workers as long as you're careful (use resource limits on services) to prevent your apps (especially databases) from starving the docker daemon of resources so bad that Swarm can't do its job. When you start having random leader changes or errors/failures, you would want "check the managers for available resources" on your short list of troubleshooting steps.</li> </ol> Other questions: <blockquote> In practice, what functionality would be slower? And what does "take longer to reach consensus" actually affect? </blockquote> More managers = longer for managers to elect a new leader when one goes down. While there is no leader, the Swarm is in a read-only state and new replica tasks cannot be launched and service updates won't happen. Any container that fails won't auto-recover because the Swarm managers can't do work. You're running apps, ingress routing mesh, etc. all still function. A large part of the performance of manager health and leader election is tied to network latency between all manager nodes, as much as it is the number of managers. This is why Docker generally advises that a single Swarms managers all be in the same region so they get a low-latency round trip between each other. There is no hardset rule here. If you test 200ms latency between managers and test failures and are fine with the results and speed of leader election, cool. Background info: <ul> <li>Swarm Admin Guide</li> <li>Laura Frank's DockerCon talk on Swarm/Raft internals and recovery</li> <li>My DockerCon talk on Swarm production considerations/design</li> <li>Nico Kabar DockerCon talk on Enterprise Swarm considerations</li> <li>(If you're going big) Running Docker EE at scale </li> </ul>

It all depends on the aim of building the cluster. For development purposes, you can use worker nodes as managers. Real concern is in scaling out, if you feel your microservices infrastructure will keep growing, then consider separating worker and manager nodes for easy scaling out. The pros are of your setup are: <ul> <li>Ease of administrations</li> <li>Setup is Highly available - 3 nodes means failure tolerance of 1</li> </ul> Cons are: <ul> <li>Not good for scaling out, container compute demands means adding more worker nodes.</li> <li>Additional manager nodes reduce write performance because more nodes must acknowledge proposals to update the swarm state. This means more network round-trip traffic which causes performance issues on your services If your dockerized application messes with the the host system, this will affect manager services. Swarm tasks will continue to run but swarm nodes cannot be added, updated, or removed, and new or existing tasks cannot be started, stopped, moved, or updated. Isolation of manager and worker services is safer.</li> </ul>

Pros and Cons of running all Docker Swarm nodes as Managers?

Tags:

docker-swarm

I am considering building out a Docker Swarm cluster. For the purpose of keeping things both simple and relatively fault-tolerant, I thought about simply running 3 nodes as managers.

What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious?

I found this Github issue which asks a similar question, but the answer is a bit ambiguous to me. It mentions the performance may be worse. It also mentions that it will take longer to reach consensus. In practice, what functionality would be slower? And what does "take longer to reach consensus" actually affect?

470

asked Feb 18 '18 15:02

Kurtis

2 Answers

TL;DR pros and cons of all managers as workers in Swarm:

Pros:

Prod-quality HA with only 3 or 5 servers
Simplicity of design/management
Still secure by default (secrets are encrypted on disk, mutual TLS auth and network encryption on control plane)
Any node can administrate the Swarm

Cons:

Requires tighter management of resources to prevent manager starvation
Lower secure posture, secrets/keys stored on apps servers
Compromised node means the whole Swarm could easily be compromised
Limited to odd number of servers, usually 3 or 5

Full Answers to Your Questions

What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious?

There are no hard requirements for using worker-only nodes. If you're deploying a solution where you know what resources you need, and the number of services/tasks are usually the same, there's nothing wrong with a Swarm of just three managers doing all the work, as long as you have considered these three areas that are affected:

Security. In a perfect world, your managers would not be internet accessible and would only be on a backend subnet, doing only manager work. The managers have all the authority for the Swarm, hold all the encrypted secrets, store the encrypted Raft log, and also (by default) store the encryption keys on disk. Workers only store secrets they need, (and only in memory) and have no authority to do any work in the Swarm other then what they've been told to do by the leader. If a worker gets compromised you haven't "lost the Swarm" necessarily. This separation of powers is not a hard requirement, and many environments accept this risk and just put the managers as the main servers that will publish services to the public. It's just a question of security/complexity vs. cost.
Node count. The minimum number of managers for redundancy is 3, and 3 or 5 is what I recommend most of the time. More managers do not equal more capacity, as only one manager is the leader at any time, and the only one to do manager work. The resource capacity of the leader is what determines how much work it can do simultaneously. If your managers are also doing app work, and you need more resource capacity then 3 nodes could handle, then I'd recommend the 4th node and higher are just workers.
Performance/scale. Ideally, your managers have all the resources they need to do things fast, like leader election, task scheduling, running and reacting to healthchecks, etc. Their resource utilization will grow the larger the number of total nodes, total services, and rate of new work they have to perform (service/network creation, task changes, node changes, healthchecks, etc.). If you have a small number of servers and small number of services/replicas, then you could likely have the managers also be workers as long as you're careful (use resource limits on services) to prevent your apps (especially databases) from starving the docker daemon of resources so bad that Swarm can't do its job. When you start having random leader changes or errors/failures, you would want "check the managers for available resources" on your short list of troubleshooting steps.

Bret Fisher

It all depends on the aim of building the cluster. For development purposes, you can use worker nodes as managers. Real concern is in scaling out, if you feel your microservices infrastructure will keep growing, then consider separating worker and manager nodes for easy scaling out.

The pros are of your setup are:

Ease of administrations
Setup is Highly available - 3 nodes means failure tolerance of 1

Cons are:

Not good for scaling out, container compute demands means adding more worker nodes.
Additional manager nodes reduce write performance because more nodes must acknowledge proposals to update the swarm state. This means more network round-trip traffic which causes performance issues on your services If your dockerized application messes with the the host system, this will affect manager services. Swarm tasks will continue to run but swarm nodes cannot be added, updated, or removed, and new or existing tasks cannot be started, stopped, moved, or updated. Isolation of manager and worker services is safer.

answered Oct 19 '22 22:10

Ben Schmeltzer

Related questions
                            
                                Docker swarm replicas on different nodes
                            
                                docker swarm throwing an error "swarm already part of swarm"
                            
                                Adding a service to a stack after the stack has been deployed
                            
                                Docker 1.12 swarm mode and container volumes
                            
                                How to access Weave DNS-Server from external?
                            
                                How to simply scale a docker-compose service and pass the index and count to each?
                            
                                Docker: Swarm worker nodes not finding locally built image
                            
                                What benefits does Docker Compose have over Docker Swarm and Docker Stack?
                            
                                Why do I need to be in Swarm mode to use Docker secrets?
                            
                                How to deploy consul using Docker 1.12 swarm mode
                            
                                Log client's "real" IP address in Docker Swarm 1.12 when accessing a service
                            
                                What is the difference between using Docker Machine with Swarm and using Swarm through the Docker daemon?
                            
                                Where is the docker swarm token stored?
                            
                                Running Elasticsearch containers in swarm mode
                            
                                Can Docker containers run in Windows IoT Core
                            
                                how get logs for docker service tasks on "preparing" state
                            
                                Docker Swarm get real IP (client host) in Nginx
                            
                                How to log container in docker swarm mode
                            
                                Why is kubernetes source code an order of magnitude larger than other container orchestrators?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With