I’m trying to use traefik with docker swarm but i’m having troubles during service updates. I run a stack deploy
or service update
the service goes down for some seconds
How to reproduce:
1 - Create a Dockerfile:
FROM jwilder/whoami
RUN echo $(date) > daniel.txt
2 - Build 2 demo images:
$ docker build -t whoami:01 .
$ docker build -t whoami:02 .
3 - Create a docker-compose.yml:
version: '3.5'
services:
app:
image: whoami:01
ports:
- 81:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.frontend.rule=Host:localhost
- traefik.port=8000
- traefik.docker.network=web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 80:80
- 8080:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
external: true
4 - Deploy the stack:
$ docker stack deploy -c docker-compose.yml stack_name
5 - Curl to get the service response:
$ while true ; do sleep .1; curl localhost; done
You should see something like this:
I'm adc1473258e9
I'm bc82ea92b560
I'm adc1473258e9
I'm bc82ea92b560
That means the load balance is working
6 - Update the service
$ docker service update --image whoami:02 got_app
The traefik respond with Bad Gateway
when should be zero downtime.
How to fix it?
Go back to your browser ( http://localhost:8080/api/rawdata ) and see that Traefik has automatically detected the new instance of the container. The output will show alternatively one of the followings: Hostname: a656c8ddca6c IP: 172.27.
We're publishing the default HTTP ports 80 and 443 on the host, and making sure the container is placed within the web network we've created earlier on. Finally, we're giving this container a static name called traefik .
Traefik is a modern HTTP reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with your existing infrastructure components (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, ...) and configures itself automatically and dynamically.
Bad gateway means traefik is configured to forward requests, but it's not able to reach the container on the ip and port that it's configured to use. Common issues causing this are:
From the comments, this is only happening during the deploy, which means traefik is hitting containers before they are ready to receive requests, or while they are being stopped.
You can configure containers with a healthcheck and send request through swarm mode's VIP using a Dockerfile that looks like:
FROM jwilder/whoami
RUN echo $(date) >/build-date.txt
HEALTHCHECK --start-period=30s --retries=1 CMD wget -O - -q http://localhost:8000
And then in the docker-compose.yml:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.loadbalancer.swarm=true
...
And I would also configure the traefik service with the following options:
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
However, traefik will keep a connection open and the VIP will continue to send all requests to the same backend container over that same connection. What you can do instead is have traefik itself perform the healthcheck:
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
...
I would still leave the healthcheck on the container itself so Docker gives the container time to start before stopping the other container. And leave the retry option on the traefik service so any request to a stopping container, or just one that hasn't been detected by the healthcheck, has a chance to try try again.
Here's the resulting compose file that I used in my environment:
version: '3.5'
services:
app:
image: test-whoami:1
ports:
- 6081:8000
deploy:
replicas: 2
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
labels:
- traefik.enable=true
- traefik.backend=app
- traefik.backend.healthcheck.path=/
- traefik.frontend.rule=Path:/
- traefik.port=8000
- traefik.docker.network=test_web
networks:
- web
reverse-proxy:
image: traefik
command:
- "--api"
- "--retry.attempts=2"
- "--forwardingTimeouts.dialTimeout=1s"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=localhost"
- "--docker.watch"
- "--docker.exposedbydefault=false"
- "--docker.network=test_web"
deploy:
replicas: 1
restart_policy:
condition: on-failure
update_config:
parallelism: 1
failure_action: rollback
placement:
constraints:
- node.role == manager
networks:
- web
ports:
- 6080:80
- 6880:8080
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
web:
Dockerfile is as quoted above. Image names, ports, network names, etc were changed to avoid conflicting with other things in my environment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With