Bad gateway with traefik and docker swarm during service update

Tags:

I’m trying to use traefik with docker swarm but i’m having troubles during service updates. I run a stack deploy or service update the service goes down for some seconds

How to reproduce:

1 - Create a Dockerfile:

FROM jwilder/whoami
RUN echo $(date) > daniel.txt

2 - Build 2 demo images:

$ docker build -t whoami:01 .
$ docker build -t whoami:02 .

3 - Create a docker-compose.yml:

version: '3.5'

services:
  app:
    image: whoami:01
    ports:
      - 81:8000
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      labels:
        - traefik.enable=true
        - traefik.backend=app
        - traefik.frontend.rule=Host:localhost
        - traefik.port=8000
        - traefik.docker.network=web
    networks:
      - web

  reverse-proxy:
    image: traefik
    command: 
      - "--api"
      - "--docker"
      - "--docker.swarmMode"
      - "--docker.domain=localhost"
      - "--docker.watch"
      - "--docker.exposedbydefault=false"
      - "--docker.network=web"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      placement:
        constraints:
          - node.role == manager
    networks:
      - web
    ports:
      - 80:80
      - 8080:8080
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

networks:
  web:
    external: true

4 - Deploy the stack:

$ docker stack deploy -c docker-compose.yml stack_name

5 - Curl to get the service response:

$ while true ; do sleep .1; curl localhost; done

You should see something like this:

I'm adc1473258e9
I'm bc82ea92b560
I'm adc1473258e9
I'm bc82ea92b560

That means the load balance is working

6 - Update the service

$ docker service update --image whoami:02 got_app

The traefik respond with Bad Gateway when should be zero downtime.

How to fix it?

928

asked Apr 16 '19 19:04

Daniel

1 Answers

Bad gateway means traefik is configured to forward requests, but it's not able to reach the container on the ip and port that it's configured to use. Common issues causing this are:

traefik and the service on different docker networks
service exists in multiple networks and traefik picks the wrong one
wrong port being used to connect to the container (use the container port and make sure it's listening on all interfaces, aka 0.0.0.0)

From the comments, this is only happening during the deploy, which means traefik is hitting containers before they are ready to receive requests, or while they are being stopped.

You can configure containers with a healthcheck and send request through swarm mode's VIP using a Dockerfile that looks like:

FROM jwilder/whoami
RUN echo $(date) >/build-date.txt
HEALTHCHECK --start-period=30s --retries=1 CMD wget -O - -q http://localhost:8000

And then in the docker-compose.yml:

  labels:
    - traefik.enable=true
    - traefik.backend=app
    - traefik.backend.loadbalancer.swarm=true
    ...

And I would also configure the traefik service with the following options:

  - "--retry.attempts=2"
  - "--forwardingTimeouts.dialTimeout=1s"

However, traefik will keep a connection open and the VIP will continue to send all requests to the same backend container over that same connection. What you can do instead is have traefik itself perform the healthcheck:

  labels:
    - traefik.enable=true
    - traefik.backend=app
    - traefik.backend.healthcheck.path=/
    ...

I would still leave the healthcheck on the container itself so Docker gives the container time to start before stopping the other container. And leave the retry option on the traefik service so any request to a stopping container, or just one that hasn't been detected by the healthcheck, has a chance to try try again.

Here's the resulting compose file that I used in my environment:

version: '3.5'

services:
  app:
    image: test-whoami:1
    ports:
      - 6081:8000
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      labels:
        - traefik.enable=true
        - traefik.backend=app
        - traefik.backend.healthcheck.path=/
        - traefik.frontend.rule=Path:/
        - traefik.port=8000
        - traefik.docker.network=test_web
    networks:
      - web

  reverse-proxy:
    image: traefik
    command:
      - "--api"
      - "--retry.attempts=2"
      - "--forwardingTimeouts.dialTimeout=1s"
      - "--docker"
      - "--docker.swarmMode"
      - "--docker.domain=localhost"
      - "--docker.watch"
      - "--docker.exposedbydefault=false"
      - "--docker.network=test_web"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      placement:
        constraints:
          - node.role == manager
    networks:
      - web
    ports:
      - 6080:80
      - 6880:8080
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

networks:
  web:

Dockerfile is as quoted above. Image names, ports, network names, etc were changed to avoid conflicting with other things in my environment.

answered Oct 23 '22 09:10

BMitch

Related questions
                            
                                How can I connect php-apache and MySQL using Docker?
                            
                                Docker - memory issue - how to set it to higher value?
                            
                                Linux sql server Docker stops after few seconds
                            
                                Docker Version 18.04.0-ce ignores unsupported options: network_mode
                            
                                Jenkins declarative pipeline with Docker/Dockerfile agent from SCM
                            
                                Connection string for postgresql in docker-compose.yml file
                            
                                Jenkins pipeline - Environment variable for docker-compose
                            
                                Docker-compose v3 not persisting postgres database
                            
                                How to determine Docker version [duplicate]
                            
                                Unable to build docker image with sasl python module
                            
                                When using heroku container:release -a MyApp web I get error: process type already running the specified docker image
                            
                                Accessing Local Kafka from within Services deployed in Local Docker For Mac (incl. Kubernetes extension)
                            
                                GitLab Pipelines: zip: command not found after installing zip on docker image?
                            
                                Why docker container name has an random number at the end?
                            
                                ecs-cli compose service up with a load balancer
                            
                                How can I add SSL in keycloak in docker
                            
                                Make systemctl work from inside a container in a debian stretch image
                            
                                Start postgres in Dockerfile
                            
                                Docker Windows how to keep container running without login?
                            
                                Is there any way to exec into an initContainer in Kubernetes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bad gateway with traefik and docker swarm during service update

Tags:

docker

deployment

load-balancing

docker-swarm

traefik

Daniel

People also ask

1 Answers

BMitch

Recent Activity

Donate For Us