It seems my server ran out of space and I was having some problems with some of the deployed docker stacks. Took me a while to figure it out, but eventually I did and removed a couple of containers and images to free some space. I was able to run <code>service docker restart</code> and it worked. However, there are some problems: <ul> <li> <code>docker info</code> says the swarm is "Pending"</li> <li> <code>docker node ls</code> shows the only node I have (Leader), it is <code>available</code> but it is <code>down</code> </li> <li> <code>journalctl -f -u docker</code> says `level=error msg="error removing task " error="incompatible value module=node/agent/worker node.id="</li> </ul> When running <code>docker service ls</code>, all services have <code>0/1</code> replicas. This is the status when running <code>docker node ls</code> <pre class="prettyprint"><code>"Status": { "State": "down", "Message": "heartbeat failure for node in \"unknown\" state", "Addr": "<ip and port>" }, "ManagerStatus": { "Leader": true, "Reachability": "reachable", "Addr": "<ip and port>" } </code></pre> How can I get my services running again?

Sometimes when you restart or update your docker version the tasks.db file gets corrupted. This is an open issue (#34827), some people have suggested a workaround to this issue moving the tasks.db file and testing if this fixes the issue then delete the tasks.db file. Docker automatically will create a new one for you. You can find the tasks.db file in /var/lib/docker/swarm/worker/ I've faced the same issue recently and this workaround saved my day. I didn't lose any data related to my Stacks Update October/19/2020 issue (#34827) is closed but the solution still the same, remove the tasks.db file

Option 1: Wait. Sometimes it fixes itself. Option 2 (May vary depending on OS): <pre class="prettyprint"><code>systemctl stop docker rm -Rf /var/lib/docker/swarm systemctl start docker docker swarm init </code></pre>

Docker Node is Down after service restart

Tags:

docker

docker-swarm

docker-stack

It seems my server ran out of space and I was having some problems with some of the deployed docker stacks. Took me a while to figure it out, but eventually I did and removed a couple of containers and images to free some space.

I was able to run service docker restart and it worked. However, there are some problems:

docker info says the swarm is "Pending"
docker node ls shows the only node I have (Leader), it is available but it is down
journalctl -f -u docker says `level=error msg="error removing task " error="incompatible value module=node/agent/worker node.id="

When running docker service ls, all services have 0/1 replicas.

This is the status when running docker node ls

Click to copy

"Status": {
    "State": "down",
    "Message": "heartbeat failure for node in \"unknown\" state",
    "Addr": "<ip and port>"
},
"ManagerStatus": {
    "Leader": true,
    "Reachability": "reachable",
    "Addr": "<ip and port>"
}

How can I get my services running again?

978

asked Jun 19 '18 16:06

Christopher Francisco

2 Answers

Sometimes when you restart or update your docker version the tasks.db file gets corrupted.

This is an open issue (#34827), some people have suggested a workaround to this issue moving the tasks.db file and testing if this fixes the issue then delete the tasks.db file. Docker automatically will create a new one for you.

You can find the tasks.db file in /var/lib/docker/swarm/worker/

I've faced the same issue recently and this workaround saved my day. I didn't lose any data related to my Stacks

Update October/19/2020

issue (#34827) is closed but the solution still the same, remove the tasks.db file

answered Sep 24 '22 03:09

Yor Jaggy

Option 1:

Wait. Sometimes it fixes itself.

Option 2 (May vary depending on OS):

Click to copy

systemctl stop docker
rm -Rf /var/lib/docker/swarm
systemctl start docker
docker swarm init

answered Sep 22 '22 03:09

Javier Yáñez

Related questions
                            
                                Ruby OOM in container
                            
                                After deployment of WAR to tomcat docker container always getting 404
                            
                                How to use a python library that is constantly changing in a docker image or new container?
                            
                                Docker - how to set up Apache + PHP in docker-compose.yml
                            
                                Execute for loop in docker container
                            
                                Can't connect to Postgres container with Docker Compose from Sequelize
                            
                                Docker container cannot connect to host machine: No route to host
                            
                                How can I install and run Docker CE on OpenSUSE Linux?
                            
                                cron on a docker container for laravel not working
                            
                                OSError: [Errno 13] Permission denied when initializing Celery in Docker
                            
                                Docker curl to other container in same host
                            
                                HeidiSQL won't list my database
                            
                                Docker build not using layer cache
                            
                                Running multiple Docker containers from a single Jenkinsfile
                            
                                On openjdk:7-jre-alpine docker how to install python 3.6
                            
                                kubectl delete/create secret forbidden (Google cloud platform)
                            
                                why docker container exits with code 0?
                            
                                How does containerization software like Docker translate CPU instructions?
                            
                                Trying to connect to Redis returns an ECONNREFUSED
                            
                                Environment variables not being set on AWS CODEBUILD

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With