Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bad gateway with traefik and docker swarm during service update

I’m trying to use traefik with docker swarm but i’m having troubles during service updates. I run a stack deploy or service update the service goes down for some seconds

How to reproduce:

1 - Create a Dockerfile:

FROM jwilder/whoami
RUN echo $(date) > daniel.txt

2 - Build 2 demo images:

$ docker build -t whoami:01 .
$ docker build -t whoami:02 .

3 - Create a docker-compose.yml:

version: '3.5'

services:
  app:
    image: whoami:01
    ports:
      - 81:8000
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      labels:
        - traefik.enable=true
        - traefik.backend=app
        - traefik.frontend.rule=Host:localhost
        - traefik.port=8000
        - traefik.docker.network=web
    networks:
      - web

  reverse-proxy:
    image: traefik
    command: 
      - "--api"
      - "--docker"
      - "--docker.swarmMode"
      - "--docker.domain=localhost"
      - "--docker.watch"
      - "--docker.exposedbydefault=false"
      - "--docker.network=web"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      placement:
        constraints:
          - node.role == manager
    networks:
      - web
    ports:
      - 80:80
      - 8080:8080
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

networks:
  web:
    external: true

4 - Deploy the stack:

$ docker stack deploy -c docker-compose.yml stack_name

5 - Curl to get the service response:

$ while true ; do sleep .1; curl localhost; done

You should see something like this:

I'm adc1473258e9
I'm bc82ea92b560
I'm adc1473258e9
I'm bc82ea92b560

That means the load balance is working

6 - Update the service

$ docker service update --image whoami:02 got_app

The traefik respond with Bad Gateway when should be zero downtime.

How to fix it?

like image 928
Daniel Avatar asked Apr 16 '19 19:04

Daniel


People also ask

How do I know if my Traefik is working?

Go back to your browser ( http://localhost:8080/api/rawdata ) and see that Traefik has automatically detected the new instance of the container. The output will show alternatively one of the followings: Hostname: a656c8ddca6c IP: 172.27.

What port does Traefik use?

We're publishing the default HTTP ports 80 and 443 on the host, and making sure the container is placed within the web network we've created earlier on. Finally, we're giving this container a static name called traefik .

What is Traefik Docker network?

Traefik is a modern HTTP reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with your existing infrastructure components (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, ...) and configures itself automatically and dynamically.


1 Answers

Bad gateway means traefik is configured to forward requests, but it's not able to reach the container on the ip and port that it's configured to use. Common issues causing this are:

  • traefik and the service on different docker networks
  • service exists in multiple networks and traefik picks the wrong one
  • wrong port being used to connect to the container (use the container port and make sure it's listening on all interfaces, aka 0.0.0.0)

From the comments, this is only happening during the deploy, which means traefik is hitting containers before they are ready to receive requests, or while they are being stopped.

You can configure containers with a healthcheck and send request through swarm mode's VIP using a Dockerfile that looks like:

FROM jwilder/whoami
RUN echo $(date) >/build-date.txt
HEALTHCHECK --start-period=30s --retries=1 CMD wget -O - -q http://localhost:8000

And then in the docker-compose.yml:

  labels:
    - traefik.enable=true
    - traefik.backend=app
    - traefik.backend.loadbalancer.swarm=true
    ...

And I would also configure the traefik service with the following options:

  - "--retry.attempts=2"
  - "--forwardingTimeouts.dialTimeout=1s"

However, traefik will keep a connection open and the VIP will continue to send all requests to the same backend container over that same connection. What you can do instead is have traefik itself perform the healthcheck:

  labels:
    - traefik.enable=true
    - traefik.backend=app
    - traefik.backend.healthcheck.path=/
    ...

I would still leave the healthcheck on the container itself so Docker gives the container time to start before stopping the other container. And leave the retry option on the traefik service so any request to a stopping container, or just one that hasn't been detected by the healthcheck, has a chance to try try again.


Here's the resulting compose file that I used in my environment:

version: '3.5'

services:
  app:
    image: test-whoami:1
    ports:
      - 6081:8000
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      labels:
        - traefik.enable=true
        - traefik.backend=app
        - traefik.backend.healthcheck.path=/
        - traefik.frontend.rule=Path:/
        - traefik.port=8000
        - traefik.docker.network=test_web
    networks:
      - web

  reverse-proxy:
    image: traefik
    command:
      - "--api"
      - "--retry.attempts=2"
      - "--forwardingTimeouts.dialTimeout=1s"
      - "--docker"
      - "--docker.swarmMode"
      - "--docker.domain=localhost"
      - "--docker.watch"
      - "--docker.exposedbydefault=false"
      - "--docker.network=test_web"
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      update_config:
        parallelism: 1
        failure_action: rollback
      placement:
        constraints:
          - node.role == manager
    networks:
      - web
    ports:
      - 6080:80
      - 6880:8080
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

networks:
  web:

Dockerfile is as quoted above. Image names, ports, network names, etc were changed to avoid conflicting with other things in my environment.

like image 65
BMitch Avatar answered Oct 23 '22 09:10

BMitch