Why does NGINX load balancer passive health check not detect when upstream server is offline?

Tags:

I have an upstream block in an Nginx config file. This block lists multiple backend servers across which to load balance requests to.

...
upstream backend {
    server backend1.com;
    server backend2.com;
    server backend3.com;
}
...

Each of the above 3 backend servers is running a Node application.

If I stop the application process on backend1 - Nginx recognises this, via passive health check, traffic is only directed to backend2 and backend3, as expected.
However, if I power down the server on which backend1 is hosted, Nginx does not recognise that it is offline and continues to attempt to send traffic/requests to it. Nginx still tries to direct traffic to the offline server, resulting in an error: 504.

Can someone shed some light on why this (scenario 2 above) may happen and if there is some further configuration needed that I am missing?

Update: I'm beginning to wonder if the behaviour I'm seeing is because the above upstream block is located with an HTTP {} Nginx context. If backend1 was indeed powered down, this would be a connection error and so (maybe off the mark here, but just thinking aloud) should this be a TCP health check?

Update 2:

nginx.conf

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 768;
    # multi_accept on;
}

http {


       upstream backends {
          server xx.xx.xx.37:3000 fail_timeout=2s;
          server xx.xx.xx.52:3000 fail_timeout=2s;
          server xx.xx.xx.69:3000 fail_timeout=2s;
        }

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    # server_tokens off;

    # server_names_hash_bucket_size 64;
    # server_name_in_redirect off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # SSL Settings
    ##
        ssl_certificate     …
        ssl_certificate_key …
        ssl_ciphers         …;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
    ssl_prefer_server_ciphers on;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;

    # gzip_vary on;
    # gzip_proxied any;
    # gzip_comp_level 6;
    # gzip_buffers 16 8k;
    # gzip_http_version 1.1;
    # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

default

server {
    listen 80;
    listen [::]:80;
    return 301 https://$host$request_uri;
    #server_name ...;
}
server {

    listen              443 ssl;
    listen              [::]:443 ssl;
    # SSL configuration
    ...
    # Add index.php to the list if you are using PHP
    index index.html index.htm;

    server_name _;

    location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
                 try_files $uri $uri/ /index.html;
                 #try_files $uri $uri/ =404;

    }

        location /api {
            rewrite /api/(.*) /$1  break;
            proxy_pass http://backends;
            proxy_redirect     off;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;
         }

        # Requests for socket.io are passed on to Node on port 3000
       location /socket.io/ {
             proxy_http_version 1.1;

             proxy_set_header Upgrade $http_upgrade;
             proxy_set_header Connection "upgrade";

             proxy_pass http://backends;
        }
}

720

asked Jul 23 '20 20:07

Kuyashii

1 Answers

The reason for you to get a 504 is when nginx does HTTP health check it tries to connect to the location(ex: / for 200 status code) which you configured. Since the backend1 is powered down and the port is not listening and the socket is closed.

It will take some time to get timeout exception and hence the 504: gateway timeout.

It's a different case when you stop the application process.The port will not be listening and it will get connection refused which is identified pretty quick and marks the instance as unavailable.

To overcome this you can set fail_timeout=2s to mark the server as unavailable default is 10 seconds.

https://nginx.org/en/docs/http/ngx_http_upstream_module.html?&_ga=2.174685482.969425228.1595841929-1716500038.1594281802#fail_timeout

188

answered Oct 19 '22 10:10

Amjad Hussain Syed

Related questions
                            
                                How to debug identical strings that do not equal in google app script?
                            
                                How to setup resources dependencies in alt:V
                            
                                Why I cannot see the output of the program I wrote in C# in VS Code when I run it in the terminal
                            
                                C++ Can I overload the bracket [] operator to do different things if it on the LHS vs RHS of an assignment?
                            
                                How to convert Rmarkdown file to working latex file
                            
                                How to resume a Prefect flow on failure without having to re-run the entire flow?
                            
                                Microsoft.ML and Xamarin
                            
                                How to customize theme of FirebaseUI- Web
                            
                                AWS on Terraform: Error deleting resource: timeout while waiting for state to become 'destroyed'
                            
                                Why is RTTI needed for non-polymorphic typeid?
                            
                                Typescript: How do I properly type an array composed of n N-tuples and m M-tuples?
                            
                                Pass state via Link while redirected to a new tab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With