Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should swarm loadbalancing perform healthchecks on its nodes?

The Load Balancing section in the swarm docs don't make it clear if the internal loadbalancer also does health checks, and if it removes nodes that aren't running the service anymore (because it got killed or the node got rebooted).

In the following case I've got a service with replicas 3, 1 instance running on each of the 3 nodes.

Manager:

[root@centosvm ~]# docker ps
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
a593d485050a        ddewaele/springboot.crud.sample:latest   "sh -c 'java $JAVA_OP"   7 minutes ago       Up 7 minutes                            springbootcrudsample.1.5syc6j4c8i3bnerdqq4e1yelm

Node1:

[root@node1 ~]# docker ps
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
d3b3fbc0f2c5        ddewaele/springboot.crud.sample:latest   "sh -c 'java $JAVA_OP"   4 minutes ago       Up 4 minutes                            springbootcrudsample.3.7y1oyjyrifgkmxlr20oai5ppl

Node 2:

[root@node2 ~]# docker ps
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
ebca8f24ec3a        ddewaele/springboot.crud.sample:latest   "sh -c 'java $JAVA_OP"   7 minutes ago       Up 7 minutes                            springbootcrudsample.2.4tqjad7od8ep047s55485na1t

Now, on node1, we kill the docker container. This node will be without a service (swarm will re-create it here after a couple of seconds to keep the replication=3 on the service)

[root@node1 ~]# docker kill d3b3fbc0f2c5
d3b3fbc0f2c5

Container gone

[root@node1 ~]# docker ps
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES

New container up

[root@node1 ~]# docker ps
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
b8c9a7a5cf97        ddewaele/springboot.crud.sample:latest   "sh -c 'java $JAVA_OP"   11 seconds ago      Up 9 seconds                            springbootcrudsample.3.9v4cnhi8dvq7n8afb2kvp28sk

In the output below however, when container d3b3fbc0f2c5 was killed, the ingress loadbalancer didn't detect this, and it was still sending traffic to the node (resulting in connection refused) ?

How should we handle such a scenario ? Do we still need an external loadbalancer for this scenario and how should we configure it ?

[root@centosvm ~]# while :; do curl http://localhost:8080/env/hostname ; echo "" ; sleep 1; done
{"hostname":"d3b3fbc0f2c5"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"d3b3fbc0f2c5"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"d3b3fbc0f2c5"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused

{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused

{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused

{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused

{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused

{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
curl: (7) Failed connect to localhost:8080; Connection refused

{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
{"hostname":"b8c9a7a5cf97"}
{"hostname":"ebca8f24ec3a"}
{"hostname":"a593d485050a"}
{"hostname":"b8c9a7a5cf97"}
like image 504
ddewaele Avatar asked Feb 04 '23 16:02

ddewaele


1 Answers

As indicated by François Maturel, with a proper healthcheck in place, Docker Swarm will take into account the health status of the container to decide if it will route requests to it.

For Spring Boot applications that have enabled the default actuators, adding this to the Dockerfile is sufficient for a basic healthcheck. When the Spring Boot app is initialized and its health actuator is enabled, the following http request will return a valid http 200 response and the healthcheck will pass.

HEALTHCHECK CMD wget -q http://localhost:8080/health -O /dev/null

This will result in your docker containers being anble to reach a healthy status. When your docker container is started, the service running within it might still be initializing. To do proper load balancing and detect service health, Swarm needs to know when it is able to route reqeusts to a particular service instance (container on a node).

So when Swarm starts a service replica, it fires up a container, it will wait until the health status of the service is "healthy". As your container is starting, it will transition from "starting" :

CONTAINER ID        IMAGE                                                                                                     COMMAND                  CREATED             STATUS                                     PORTS               NAMES
5001e1c46953        ddewaele/springboot.crud.sample@sha256:4ce69c3f50c69640c8240f9df68c8816605c6214b74e6581be44ce153c0f3b7a   "/docker-entrypoin..."   5 seconds ago       Up Less than a second (health: starting)                       springbootcrudsample.2.yt6d38zhhq2wxt1d6qfjz5974

to 'healthy'. Only then will the Swarm load balancer route requests to this endpoint.

[root@centos-a ~]# docker ps
CONTAINER ID        IMAGE                                                                                                     COMMAND                  CREATED              STATUS                        PORTS               NAMES
5001e1c46953        ddewaele/springboot.crud.sample@sha256:4ce69c3f50c69640c8240f9df68c8816605c6214b74e6581be44ce153c0f3b7a   "/docker-entrypoin..."   About a minute ago   Up About a minute (healthy)                       springbootcrudsample.2.yt6d38zhhq2wxt1d6qfjz5974
like image 173
ddewaele Avatar answered Feb 07 '23 06:02

ddewaele