It seems my server ran out of space and I was having some problems with some of the deployed docker stacks. Took me a while to figure it out, but eventually I did and removed a couple of containers and images to free some space.
I was able to run service docker restart
and it worked. However, there are some problems:
docker info
says the swarm is "Pending"docker node ls
shows the only node I have (Leader), it is available
but it is down
journalctl -f -u docker
says `level=error msg="error removing task " error="incompatible value module=node/agent/worker node.id="When running docker service ls
, all services have 0/1
replicas.
This is the status when running docker node ls
"Status": {
"State": "down",
"Message": "heartbeat failure for node in \"unknown\" state",
"Addr": "<ip and port>"
},
"ManagerStatus": {
"Leader": true,
"Reachability": "reachable",
"Addr": "<ip and port>"
}
How can I get my services running again?
For instance, if a worker node becomes unavailable, Docker schedules that node's tasks on other nodes. A task is a running container which is part of a swarm service and managed by a swarm manager, as opposed to a standalone container.
If the manager in a single-manager swarm fails, your services continue to run, but you need to create a new cluster to recover. To take advantage of swarm mode's fault-tolerance features, Docker recommends you implement an odd number of nodes according to your organization's high-availability requirements.
Leave the swarmRun the docker swarm leave command on a node to remove it from the swarm. For example to leave the swarm on a worker node: $ docker swarm leave Node left the swarm. When a node leaves the swarm, the Docker Engine stops running in swarm mode.
You can use the --force or -f flag with the docker service update command to force the service to redistribute its tasks across the available worker nodes. This causes the service tasks to restart.
Sometimes when you restart or update your docker version the tasks.db file gets corrupted.
This is an open issue (#34827), some people have suggested a workaround to this issue moving the tasks.db file and testing if this fixes the issue then delete the tasks.db file. Docker automatically will create a new one for you.
You can find the tasks.db file in /var/lib/docker/swarm/worker/
I've faced the same issue recently and this workaround saved my day. I didn't lose any data related to my Stacks
Update October/19/2020
issue (#34827) is closed but the solution still the same, remove the tasks.db file
Option 1:
Wait. Sometimes it fixes itself.
Option 2 (May vary depending on OS):
systemctl stop docker
rm -Rf /var/lib/docker/swarm
systemctl start docker
docker swarm init
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With