Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make nginx wait for my upstream service to start up in a Docker Swarm?

I deploy an nginx proxy service and a rails app service into a docker swarm. The nginx depends on the app in my docker-compose file.

My nginx.conf file directs traffic to my upstream app service (exposed on port 3000) like so (only showing the upstream part).

upstream puma {
  server app:3000;
}

My docker-compose file looks like so:

version: '3.1'

services:

  app:
    image: my/rails-app:latest
    networks:
      - proxy

  web:
    image: my/nginx:1.11.9-alpine
    command: /bin/sh -c "nginx -g 'daemon off;'"
    ports:
      - "80:80"
    depends_on:
      - app
    networks:
      - proxy


networks:

  proxy:
    external: true

My host is setup to be swarm manager.

This all works totally fine - there are no problems.

However, even though I have a depends section in my docker-compose file - the app service may not be completely (?) ready by the time the nginx service starts up, so when the upstream service config part tries to DNS resolve "app:3000", it seems like it is not finding it completely. So when I visit my site, I find the following error message in my nginx logs:

2017/02/13 10:46:07 [error] 8#8: *6 connect() failed (111: Connection refused) while connecting to upstream, client: 10.255.0.3, server: www.mysite.com, request: "GET / HTTP/1.1", upstream: "http://127.0.53.53:3000/", host: "preprod.local"

If I kill the docker container that is running the nginx service, and swarm reschedules it a moment later and it returns, if I then visit the same URL it works completely fine, and the request is passed successfully upstream to app:3000.

How can I prevent this from happening - where the startup timings are out by a little bit and at the time when nginx starts it can't yet properly resolve my swarm service called app:3000 - and instead it is attempting to pass the traffic onto an IP address ....

BTW - the same happens if I reboot my virtual machine - when docker (in swarm mode) brings up the services again - I can end up with the same problem. Restarting the nginx container solves the problem.

like image 783
Joerg Avatar asked Feb 13 '17 11:02

Joerg


People also ask

How can we control the start up order of services in Docker compose?

You can control the order of service startup and shutdown with the depends_on option. Compose always starts and stops containers in dependency order, where dependencies are determined by depends_on , links , volumes_from , and network_mode: "service:..." .


2 Answers

depends_on option doesn't wait for the container to be ready, only until it's running. https://docs.docker.com/compose/startup-order/

There are two more options.

  1. Starting from Compose v2.1 it is possible to include healthcheck in depends_on option. https://docs.docker.com/compose/compose-file/compose-file-v2/#dependson
  2. You can do the same using external tools like dockerize or wait-for-it.
like image 101
andrey Avatar answered Oct 08 '22 12:10

andrey


I have figured out a way to do this - and this is to use the HEALTHCHECK section of the Dockerfile, or docker-compose file.

First of all, it seems as if the depends_on option isn't really used when deploying a stack with

docker stack deploy -c docker-compose.yml mystack

Docker in swarm mode would just restart a service task if it wasn't able to start properly or failed for some other reason. So the depends_on options isn't really that useful.

So this is my solution in the end, and so far it works very well:

version: '3.1'

services:

  app:
    image: my/rails-app:latest
    networks:
      - proxy

  web:
    image: my/nginx:1.11.9-alpine
    command: /bin/sh -c "nginx -g 'daemon off;'"
    ports:
      - "80:80"
    networks:
      - proxy
    healthcheck:
        test: ["CMD", "wget", "-qO-", "http://localhost/healthcheck"]
        interval: 5s
        timeout: 3s
        retries: 3

networks:

  proxy:
    external: true

So what I do is, from the nginx server I try access a route on my Rails app - I created one called /healthcheck and it returns a status code of 200.

So when I try and access it, and the result is a failure (the app server is not ready yet) - nginx will be restarted. Hopefully when it starts up again, the app server will be available, and the upstream app:3000 directive will do a correct DNS resolve.

So in this way I've "hacked" together the (missing) depends_on behaviour that could work in swarm mode.

like image 33
Joerg Avatar answered Oct 08 '22 14:10

Joerg