Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Docker Node is Down after service restart

It seems my server ran out of space and I was having some problems with some of the deployed docker stacks. Took me a while to figure it out, but eventually I did and removed a couple of containers and images to free some space.

I was able to run service docker restart and it worked. However, there are some problems:

  • docker info says the swarm is "Pending"
  • docker node ls shows the only node I have (Leader), it is available but it is down
  • journalctl -f -u docker says `level=error msg="error removing task " error="incompatible value module=node/agent/worker node.id="

When running docker service ls, all services have 0/1 replicas.

This is the status when running docker node ls

"Status": {
    "State": "down",
    "Message": "heartbeat failure for node in \"unknown\" state",
    "Addr": "<ip and port>"
},
"ManagerStatus": {
    "Leader": true,
    "Reachability": "reachable",
    "Addr": "<ip and port>"
}

How can I get my services running again?

like image 978
Christopher Francisco Avatar asked Jun 19 '18 16:06

Christopher Francisco


People also ask

What happens to a worker node if it becomes unavailable in docker?

For instance, if a worker node becomes unavailable, Docker schedules that node's tasks on other nodes. A task is a running container which is part of a swarm service and managed by a swarm manager, as opposed to a standalone container.

What happens if docker swarm manager goes down?

If the manager in a single-manager swarm fails, your services continue to run, but you need to create a new cluster to recover. To take advantage of swarm mode's fault-tolerance features, Docker recommends you implement an odd number of nodes according to your organization's high-availability requirements.

How do I get out of swarm mode?

Leave the swarmRun the docker swarm leave command on a node to remove it from the swarm. For example to leave the swarm on a worker node: $ docker swarm leave Node left the swarm. When a node leaves the swarm, the Docker Engine stops running in swarm mode.

How do I restart a docker in swarm node?

You can use the --force or -f flag with the docker service update command to force the service to redistribute its tasks across the available worker nodes. This causes the service tasks to restart.


2 Answers

Sometimes when you restart or update your docker version the tasks.db file gets corrupted.

This is an open issue (#34827), some people have suggested a workaround to this issue moving the tasks.db file and testing if this fixes the issue then delete the tasks.db file. Docker automatically will create a new one for you.

You can find the tasks.db file in /var/lib/docker/swarm/worker/

I've faced the same issue recently and this workaround saved my day. I didn't lose any data related to my Stacks

Update October/19/2020

issue (#34827) is closed but the solution still the same, remove the tasks.db file

like image 58
Yor Jaggy Avatar answered Sep 24 '22 03:09

Yor Jaggy


Option 1:

Wait. Sometimes it fixes itself.

Option 2 (May vary depending on OS):

systemctl stop docker
rm -Rf /var/lib/docker/swarm
systemctl start docker
docker swarm init
like image 22
Javier Yáñez Avatar answered Sep 22 '22 03:09

Javier Yáñez