Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does NGINX load balancer passive health check not detect when upstream server is offline?

Tags:

I have an upstream block in an Nginx config file. This block lists multiple backend servers across which to load balance requests to.

...
upstream backend {
    server backend1.com;
    server backend2.com;
    server backend3.com;
}
...

Each of the above 3 backend servers is running a Node application.

  1. If I stop the application process on backend1 - Nginx recognises this, via passive health check, traffic is only directed to backend2 and backend3, as expected.
  2. However, if I power down the server on which backend1 is hosted, Nginx does not recognise that it is offline and continues to attempt to send traffic/requests to it. Nginx still tries to direct traffic to the offline server, resulting in an error: 504.

Can someone shed some light on why this (scenario 2 above) may happen and if there is some further configuration needed that I am missing?

Update: I'm beginning to wonder if the behaviour I'm seeing is because the above upstream block is located with an HTTP {} Nginx context. If backend1 was indeed powered down, this would be a connection error and so (maybe off the mark here, but just thinking aloud) should this be a TCP health check?

Update 2:

nginx.conf

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 768;
    # multi_accept on;
}

http {


       upstream backends {
          server xx.xx.xx.37:3000 fail_timeout=2s;
          server xx.xx.xx.52:3000 fail_timeout=2s;
          server xx.xx.xx.69:3000 fail_timeout=2s;
        }

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    # server_tokens off;

    # server_names_hash_bucket_size 64;
    # server_name_in_redirect off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # SSL Settings
    ##
        ssl_certificate     …
        ssl_certificate_key …
        ssl_ciphers         …;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
    ssl_prefer_server_ciphers on;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;

    # gzip_vary on;
    # gzip_proxied any;
    # gzip_comp_level 6;
    # gzip_buffers 16 8k;
    # gzip_http_version 1.1;
    # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

default

server {
    listen 80;
    listen [::]:80;
    return 301 https://$host$request_uri;
    #server_name ...;
}
server {

    listen              443 ssl;
    listen              [::]:443 ssl;
    # SSL configuration
    ...
    # Add index.php to the list if you are using PHP
    index index.html index.htm;

    server_name _;

    location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
                 try_files $uri $uri/ /index.html;
                 #try_files $uri $uri/ =404;

    }

        location /api {
            rewrite /api/(.*) /$1  break;
            proxy_pass http://backends;
            proxy_redirect     off;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;
         }

        # Requests for socket.io are passed on to Node on port 3000
       location /socket.io/ {
             proxy_http_version 1.1;

             proxy_set_header Upgrade $http_upgrade;
             proxy_set_header Connection "upgrade";

             proxy_pass http://backends;
        }
}
like image 720
Kuyashii Avatar asked Jul 23 '20 20:07

Kuyashii


People also ask

How does Nginx health check work?

NGINX Plus sends special health check requests to each upstream server and checks for a response that satisfies certain conditions. If a connection to the server cannot be established, the health check fails, and the server is considered unhealthy. NGINX Plus does not proxy client connections to unhealthy servers.

How do I know if Nginx load balancing is working?

To test the Nginx load balancing, open a web browser and use the following address to navigate. Once the website interface loads, take note of the application instance that has loaded. Then continuously refresh the page. At some point, the app should be loaded from the second server indicating load balancing.

What does upstream do in Nginx?

The servers that Nginx proxies requests to are known as upstream servers. Nginx can proxy requests to servers that communicate using the http(s), FastCGI, SCGI, and uwsgi, or memcached protocols through separate sets of directives for each type of proxy.

How does load balancing work in Nginx?

A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance.


1 Answers

The reason for you to get a 504 is when nginx does HTTP health check it tries to connect to the location(ex: / for 200 status code) which you configured. Since the backend1 is powered down and the port is not listening and the socket is closed.

It will take some time to get timeout exception and hence the 504: gateway timeout.

It's a different case when you stop the application process.The port will not be listening and it will get connection refused which is identified pretty quick and marks the instance as unavailable.

To overcome this you can set fail_timeout=2s to mark the server as unavailable default is 10 seconds.

https://nginx.org/en/docs/http/ngx_http_upstream_module.html?&_ga=2.174685482.969425228.1595841929-1716500038.1594281802#fail_timeout

like image 188
Amjad Hussain Syed Avatar answered Oct 19 '22 10:10

Amjad Hussain Syed