I've been deploying stacks to swarms with the start-first
option for quite a while now.
So given the following api.yml
file:
version: '3.4'
services:
api:
image: registry.gitlab.com/myproj/api:${VERSION}
deploy:
update_config:
order: start-first
I would run the following command against a swarm manager:
env VERSION=x.y.z docker stack deploy -f api.yml api
This worked fine - the old service kept serving requests until the new one was fully available. Only then would it be torn down and enter shutdown state.
Now recently, and I believe this started happening with docker v17.12.0-ce or v18.01.0-ce - or I didn't notice before - what happens instead is that the old service sometimes isn't correctly stopped.
When that happens it hangs around and keeps serving requests, resulting in us running a mix of old and new versions side by side indefinitely.
This happens both on swarms that have the service replicated but also on one that runs it with scale=1
.
What's worse, I cannot even kill the old containers. Here's what I've tried:
docker service rm api_api
docker stack rm api && docker stack deploy -f api.yml api
docker rm -f <container id>
Nothing allows me to get rid of the 'zombie' container. In fact docker rm -f <container id>
even locks up and simply sits there.
The only way I've found to get rid of them is to restart the node. Thanks to replication I can actually afford to do that without downtime but it's not great for various reasons, least of which is what may happen if another manager were to go down while I do that.
Has anyone else seen this behaviour? What might be the cause and how could I debug this?
By default, when an update to an individual task returns a state of RUNNING , the scheduler schedules another task to update until all tasks are updated. If, at any time during an update a task returns FAILED , the scheduler pauses the update.
Swarm mode orchestration has the objective to make the current state match your target definition. So "stop all the containers" brings your current state out of the target state goal and swarm will try to correct that. You could do this by changing your target, but that needs to be done per service.
When running Docker Engine in swarm mode, you can use docker stack deploy to deploy a complete application stack to the swarm. The deploy command accepts a stack description in the form of a Compose file. The docker stack deploy command supports any Compose file of version “3.0” or above.
Use the --rollback option to roll back to the previous version of the service. This will revert the service to the configuration that was in place before the most recent docker service update command.
Try to set max_replicas_per_node
(1 if only needed one replica per node) in placement
section
Refer to https://docs.docker.com/compose/compose-file/compose-file-v3/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With