Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The "--cluster-store" and "--cluster-advertise" don't work

Tags:

docker

I try to setup docker cluster with swarm and consul. I have manager, host1, and host2.
I run consul and swarm manager containers on the manager.

$ docker run --rm -p 8500:8500 progrium/consul -server -bootstrap
$ docker run -d -p 2377:2375 swarm manage consul://<manager>:8500

On host1 and host2, I modify the daemon options with --cluster-store and --cluster-advertise, and restart docker daemon.

host1
DOCKER_OPTS="--cluster-store=consul://<manager>:8500 --cluster-advertise=<host1>:2375"
host2
DOCKER_OPTS="--cluster-store=consul://<manager>:8500 --cluster-advertise=<host2>:2375"

When I join host1 and host2 to the swarm, it fails.

host1 $ docker run --rm swarm join --advertise=<host1>:2375 consul://<manager>:8500
host2 $ docker run --rm swarm join --advertise=<host2>:2375 consul://<manager>:8500

From the swarm manager log, it error out.

time="2016-01-20T02:17:17Z" level=error msg="Get http://<host1>:2375/v1.15/info: dial tcp <host1>:2375: getsockopt: connection refused"
time="2016-01-20T02:17:20Z" level=error msg="Get http://<host2>:2375/v1.15/info: dial tcp <host2>:2375: getsockopt: connection refused"
like image 419
firelyu Avatar asked Jan 20 '16 05:01

firelyu


1 Answers

Since i've come about a similar problem aswell i did eventually find out why it didn't work (in my example I'm using multiple boxes on a LAN 192.168.10.0/24 that I want to manage from in there and only allow access from the outside to certain containers -- the following examples are run on the box at 192.168.10.1):

  • set up the Daemons with --cluster-store consul://192.168.10.1:8500 and port 8500 (deploying Consul & registrator on each Daemon as the first containers) and --cluster-advertise 192.168.10.1:2375 aswell as -H tcp://192.168.10.1:2375 -H unix:///var/run/docker.sock -H tcp://127.0.0.1:2375 (i do not however bind to the other available addresses as you would with tcp://0.0.0.0:2375 and instead only bind to the local 192.168.10.0/24). In case you want containers only binding to the local network aswell (as i did in this case) you can specify the additional --ip parameter for the Daemon - when containers should be available to everywhere else aswell (in my case only an nginx load balancer with failover via keepalived) you specify binding the port to all interfaces docker run ... -p 0.0.0.0:host_port:container_port ... <image>
  • Start the Daemons
  • Deploy gliderlabs/registrator and Consul with compose (this is an example from the first box in my setup but I start the equivalent on all Daemons for a complete Consul HA failover setup) docker-compose -p bootstrap up -d (naming the containers bootstrap_registrator_1 and bootstrap_consul_1 in the private network bootstrap):

    version: '2'
    services:
      registrator:
        image: gliderlabs/registrator
        command: consul://192.168.10.1:8500
        depends_on:
          - consul
        volumes:
          - /var/run/docker.sock:/tmp/docker.sock
        restart: unless-stopped
    
      consul:
        image: consul
        command: agent -server -bootstrap -ui -advertise 192.168.10.1 -client 0.0.0.0
        hostname: srv-0
        network_mode: host
        ports:
          - "8300:8300"     # Server RPC, Server Use Only
          - "8301:8301/tcp" # Serf Gossip Protocol for LAN
          - "8301:8301/udp" # Serf Gossip Protocol for LAN
          - "8302:8302/tcp" # Serf Gossip Protocol for WAN, Server Use Only
          - "8302:8302/udp" # Serf Gossip Protocol for WAN, Server Use Only
          - "8400:8400"     # CLI RPC
          - "8500:8500"     # HTTP API & Web UI
          - "53:8600/tcp"   # DNS Interface
          - "53:8600/udp"   # DNS Interface
        restart: unless-stopped
    
  • now the Daemons register and set locks on the KV-store (Consul) in docker/nodes and Swarm does not automatically seem to read from this location.. So when it tries to read which Daemons are available it doesn't find any. Now this bit cost me the most time: To solve this I had to specify --discovery-opt kv.path=docker/nodes and start Swarm with docker-compose -p bootstrap up -d - on all boxes aswell to end up with a Swarm HA failover of managers:

    version: '2'
    services:
      swarm-manager:
        image: swarm
        command: manage -H :3375 --replication --advertise 192.168.10.1:3375 --discovery-opt kv.path=docker/nodes consul://192.168.10.1:8500
        hostname: srv-0
        ports:
          - "192.168.10.1:3375:3375" #
        restart: unless-stopped
    
  • Now I end up with a working Swarm that is only available on the 192.168.10.0/24 network on port 3375. All containers that are started are only available to this network aswell unless i specify -p 0.0.0.0:host_port:container_port when starting (with docker run)

  • Further scaling: When I add more boxes to the local network to grow the capacity my idea would be to add more Daemons and maybe non-manager Swarm instances with those aswell as later Consul clients (rather than servers, started with -server).
like image 163
Jan Avatar answered Sep 25 '22 02:09

Jan