Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pros and Cons of running all Docker Swarm nodes as Managers?

Tags:

docker-swarm

I am considering building out a Docker Swarm cluster. For the purpose of keeping things both simple and relatively fault-tolerant, I thought about simply running 3 nodes as managers.

What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious?

I found this Github issue which asks a similar question, but the answer is a bit ambiguous to me. It mentions the performance may be worse. It also mentions that it will take longer to reach consensus. In practice, what functionality would be slower? And what does "take longer to reach consensus" actually affect?

like image 470
Kurtis Avatar asked Feb 18 '18 15:02

Kurtis


People also ask

What is the recommended number of manager nodes in Docker Swarm?

Docker recommends a maximum of seven manager nodes for a swarm.

Why should there be an odd number of managers in a Docker Swarm?

You should maintain an odd number of managers in the swarm to support manager node failures.

What is the difference between a manager and a worker in Docker Swarm?

Manager nodes elect a single leader to conduct orchestration tasks. Worker nodes receive and execute tasks dispatched from manager nodes. By default manager nodes also run services as worker nodes, but you can configure them to run manager tasks exclusively and be manager-only nodes.

Which of the following would be the consequence of having more than one manager node in a Docker Swarm?

Additional manager nodes reduce write performance because more nodes must acknowledge proposals to update the swarm state. This means more network round-trip traffic which causes performance issues on your services If your dockerized application messes with the the host system, this will affect manager services.


2 Answers

TL;DR pros and cons of all managers as workers in Swarm:

Pros:

  • Prod-quality HA with only 3 or 5 servers
  • Simplicity of design/management
  • Still secure by default (secrets are encrypted on disk, mutual TLS auth and network encryption on control plane)
  • Any node can administrate the Swarm

Cons:

  • Requires tighter management of resources to prevent manager starvation
  • Lower secure posture, secrets/keys stored on apps servers
  • Compromised node means the whole Swarm could easily be compromised
  • Limited to odd number of servers, usually 3 or 5

Full Answers to Your Questions

What are the trade-offs when not using any dedicated worker nodes? Is there anything I should be aware of that might not be obvious?

There are no hard requirements for using worker-only nodes. If you're deploying a solution where you know what resources you need, and the number of services/tasks are usually the same, there's nothing wrong with a Swarm of just three managers doing all the work, as long as you have considered these three areas that are affected:

  1. Security. In a perfect world, your managers would not be internet accessible and would only be on a backend subnet, doing only manager work. The managers have all the authority for the Swarm, hold all the encrypted secrets, store the encrypted Raft log, and also (by default) store the encryption keys on disk. Workers only store secrets they need, (and only in memory) and have no authority to do any work in the Swarm other then what they've been told to do by the leader. If a worker gets compromised you haven't "lost the Swarm" necessarily. This separation of powers is not a hard requirement, and many environments accept this risk and just put the managers as the main servers that will publish services to the public. It's just a question of security/complexity vs. cost.
  2. Node count. The minimum number of managers for redundancy is 3, and 3 or 5 is what I recommend most of the time. More managers do not equal more capacity, as only one manager is the leader at any time, and the only one to do manager work. The resource capacity of the leader is what determines how much work it can do simultaneously. If your managers are also doing app work, and you need more resource capacity then 3 nodes could handle, then I'd recommend the 4th node and higher are just workers.
  3. Performance/scale. Ideally, your managers have all the resources they need to do things fast, like leader election, task scheduling, running and reacting to healthchecks, etc. Their resource utilization will grow the larger the number of total nodes, total services, and rate of new work they have to perform (service/network creation, task changes, node changes, healthchecks, etc.). If you have a small number of servers and small number of services/replicas, then you could likely have the managers also be workers as long as you're careful (use resource limits on services) to prevent your apps (especially databases) from starving the docker daemon of resources so bad that Swarm can't do its job. When you start having random leader changes or errors/failures, you would want "check the managers for available resources" on your short list of troubleshooting steps.

Other questions:

In practice, what functionality would be slower? And what does "take longer to reach consensus" actually affect?

More managers = longer for managers to elect a new leader when one goes down. While there is no leader, the Swarm is in a read-only state and new replica tasks cannot be launched and service updates won't happen. Any container that fails won't auto-recover because the Swarm managers can't do work. You're running apps, ingress routing mesh, etc. all still function. A large part of the performance of manager health and leader election is tied to network latency between all manager nodes, as much as it is the number of managers. This is why Docker generally advises that a single Swarms managers all be in the same region so they get a low-latency round trip between each other. There is no hardset rule here. If you test 200ms latency between managers and test failures and are fine with the results and speed of leader election, cool.

Background info:

  • Swarm Admin Guide
  • Laura Frank's DockerCon talk on Swarm/Raft internals and recovery
  • My DockerCon talk on Swarm production considerations/design
  • Nico Kabar DockerCon talk on Enterprise Swarm considerations
  • (If you're going big) Running Docker EE at scale
like image 75
Bret Fisher Avatar answered Oct 19 '22 23:10

Bret Fisher


It all depends on the aim of building the cluster. For development purposes, you can use worker nodes as managers. Real concern is in scaling out, if you feel your microservices infrastructure will keep growing, then consider separating worker and manager nodes for easy scaling out.

The pros are of your setup are:

  • Ease of administrations

  • Setup is Highly available - 3 nodes means failure tolerance of 1

Cons are:

  • Not good for scaling out, container compute demands means adding more worker nodes.

  • Additional manager nodes reduce write performance because more nodes must acknowledge proposals to update the swarm state. This means more network round-trip traffic which causes performance issues on your services If your dockerized application messes with the the host system, this will affect manager services. Swarm tasks will continue to run but swarm nodes cannot be added, updated, or removed, and new or existing tasks cannot be started, stopped, moved, or updated. Isolation of manager and worker services is safer.

like image 32
Ben Schmeltzer Avatar answered Oct 19 '22 22:10

Ben Schmeltzer