I am attempting to have a kubernetes nginx deployment with zero downtime. Part of that process has been to initiate a rollingUpdate, which ensures that at least one pod is running nginx at all times. This works perfectly well. I am running into errors when the old nginx pod is terminating. According to the kubernetes docs on termination, kubernetes will: <ol> <li>remove the pod from the endpoints list for the service, so it is not receiving any new traffic when termination begins </li> <li>invoke a pre-stop hook if it is defined, and wait for it to complete </li> <li>send SIGTERM to all remaining processes</li> <li>send SIGKILL to any remaining processes after the grace period expires.</li> </ol> I understand that the command <code>nginx -s quit</code> is supposed to gracefully terminate nginx by waiting for all workers to complete requests before the master terminates. It responds gracefully to the SIGQUIT command, while SIGTERM results in violent termination. Other forums say that it is as easy as adding the following preStop hook to your deployment: <pre class="prettyprint"><code>lifecycle: preStop: exec: command: ["/usr/sbin/nginx", "-s", "quit"] </code></pre> However, from testing this command I have found that <code>nginx -s quit</code> returns immediately, instead of waiting for the workers to complete. It also does not return the PID of the master process, which is what I was hoping for D: What happens is, kubernetes invokes <code>nginx -s quit</code>, which will send a proper SIGQUIT to the worker children, but not wait for them to complete. Instead it will jump right to step 3 and SIGTERM those processes instead, resulting in violent termination, and thus, lost connections. QUESTION: Has anyone figured out a good way to gracefully shut down their nginx controller during a rolling deployment and have zero downtime? A <code>sleep</code> workaround isn't good enough, I'm looking for something more robust. Below is the full deployment yaml: <pre class="prettyprint"><code>apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-ingress-controller spec: replicas: 1 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 template: metadata: labels: app: nginx-ingress-lb spec: terminationGracePeriodSeconds: 60 serviceAccount: nginx containers: - name: nginx-ingress-controller image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.8 imagePullPolicy: Always readinessProbe: httpGet: path: /healthz port: 10254 scheme: HTTP livenessProbe: httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 timeoutSeconds: 5 args: - /nginx-ingress-controller - --default-backend-service=$(POD_NAMESPACE)/default-backend - --v=2 env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace ports: - containerPort: 80 lifecycle: preStop: exec: command: ["/usr/sbin/nginx", "-s", "quit"] </code></pre>

I hate answering my own questions, but after noodling a bit this is what i have so far. I created a bash script that is semi-blocking, called <code>killer</code>: <pre class="prettyprint"><code>#!/bin/bash sleep 3 PID=$(cat /run/nginx.pid) nginx -s quit while [ -d /proc/$PID ]; do sleep 0.1 done </code></pre> I found that inside the nginx pod there is a file <code>/run/nginx.pid</code> which has the PID of the master process. If you call <code>nginx -s quit</code> and initiate a wait until the process disappears, you have essentially made the quit command "blocking". Note that there is a <code>sleep 3</code> before anything happens. This is due to a race condition where Kubernetes marks a pod as terminating, but takes a little time (< 1s) to remove this pod from the service that points traffic toward it. I have mounted this script into my pod, and called it via the <code>preStop</code> directive. It mostly works, but during testing there are still occasional blips where i get a curl error that the connection was "reset by peer." But this is a step in the right direction.

Kubernetes Nginx: How to have zero-downtime deployments?

Tags:

nginx

termination

kubernetes

I am attempting to have a kubernetes nginx deployment with zero downtime. Part of that process has been to initiate a rollingUpdate, which ensures that at least one pod is running nginx at all times. This works perfectly well.

I am running into errors when the old nginx pod is terminating. According to the kubernetes docs on termination, kubernetes will:

remove the pod from the endpoints list for the service, so it is not receiving any new traffic when termination begins
invoke a pre-stop hook if it is defined, and wait for it to complete
send SIGTERM to all remaining processes
send SIGKILL to any remaining processes after the grace period expires.

I understand that the command nginx -s quit is supposed to gracefully terminate nginx by waiting for all workers to complete requests before the master terminates. It responds gracefully to the SIGQUIT command, while SIGTERM results in violent termination. Other forums say that it is as easy as adding the following preStop hook to your deployment:

lifecycle:
  preStop:
    exec:
      command: ["/usr/sbin/nginx", "-s", "quit"]

However, from testing this command I have found that nginx -s quit returns immediately, instead of waiting for the workers to complete. It also does not return the PID of the master process, which is what I was hoping for D:

What happens is, kubernetes invokes nginx -s quit, which will send a proper SIGQUIT to the worker children, but not wait for them to complete. Instead it will jump right to step 3 and SIGTERM those processes instead, resulting in violent termination, and thus, lost connections.

QUESTION: Has anyone figured out a good way to gracefully shut down their nginx controller during a rolling deployment and have zero downtime? A sleep workaround isn't good enough, I'm looking for something more robust.

Below is the full deployment yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
 template:
    metadata:
      labels:
        app: nginx-ingress-lb
    spec:
      terminationGracePeriodSeconds: 60
      serviceAccount: nginx
      containers:
        - name: nginx-ingress-controller
          image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.8
          imagePullPolicy: Always
          readinessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
          livenessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            timeoutSeconds: 5
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-backend
            - --v=2
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
            - containerPort: 80
          lifecycle:
            preStop:
              exec:
                command: ["/usr/sbin/nginx", "-s", "quit"]

759

asked Jul 13 '17 14:07

Lindsay Landry

1 Answers

I hate answering my own questions, but after noodling a bit this is what i have so far.

I created a bash script that is semi-blocking, called killer:

#!/bin/bash

sleep 3
PID=$(cat /run/nginx.pid)
nginx -s quit

while [ -d /proc/$PID ]; do
  sleep 0.1
done

I found that inside the nginx pod there is a file /run/nginx.pid which has the PID of the master process. If you call nginx -s quit and initiate a wait until the process disappears, you have essentially made the quit command "blocking".

Note that there is a sleep 3 before anything happens. This is due to a race condition where Kubernetes marks a pod as terminating, but takes a little time (< 1s) to remove this pod from the service that points traffic toward it.

I have mounted this script into my pod, and called it via the preStop directive. It mostly works, but during testing there are still occasional blips where i get a curl error that the connection was "reset by peer." But this is a step in the right direction.

114

answered Oct 14 '22 00:10

Lindsay Landry

Related questions
                            
                                Nginx: In which order rate limiting and caching are executed?
                            
                                NGINX/PHP downloading instead of executing
                            
                                gunicorn, nginx (v 1.3.14), django, and gevent-socket.io, on dotcloud
                            
                                Django send_mail() works from shell but not in nginx production
                            
                                uWSGI error hr_instance_read(): Connection reset by peer
                            
                                random 502 gateway errors with nginx php-fpm and ubuntu
                            
                                Nginx + php-fpm: 504 timeout error - upstream timed out (110: Connection timed out)
                            
                                Reactphp process status statistic (idle, worked, etc)
                            
                                No script name passed to php-fpm via nginx/FastCGI
                            
                                How can i forcefully redirect http request to https in passenger standalone with aws elastic load balancer?
                            
                                "worker_processes" directive is not allowed nginx
                            
                                pesky popular "No input file specified." with nginx_php-fastcgi
                            
                                Nginx caching with variable param order
                            
                                nginx and trailing slash with proxy pass
                            
                                How can I use XDebug with a PHP upstream behind an nginx reverse proxy?
                            
                                Setting up subdomains on nginx? [closed]
                            
                                Enable http2 on Nginx on Windows
                            
                                502 Bad Gateway - NGINX no resolver defined to resolve
                            
                                Haystack says “Model could not be found for SearchResult”
                            
                                Kestrel webserver for Asp.Net Core - does it recycle / reload after some time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kubernetes Nginx: How to have zero-downtime deployments?

Tags:

nginx

termination

kubernetes

Lindsay Landry

People also ask

1 Answers

Lindsay Landry

Recent Activity

Donate For Us