Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intermittent 502s with nginx+gunicorn+django

Over the past few weeks we've been getting more and more 502 errors. Currently our stack is nginx + gunicron + django on an m1.large EC2 instance backed by a small RDS instance.

They seem to become more frequent as the request load increases. I will see the random 502 while using a browser, but our command line scripts that hit the api (Tasty Pie) will usually fail on their second or third request. However, if I add in a sleep function to the script right before it makes a request it will be okay for that request but 502 on the next. Note that we are using digest auth with the requests library and slumber wrapper -- hence the 401, 200 pattern.

To make debugging even trickier, the issue resolves itself when Gunicorn is run with the --debug option. The error still exists if I remove the --debug option but limit my Gunicorn workers to 1 explicitly.

My nginx.conf:

user www-data;
worker_processes 4;
pid /var/run/nginx.pid;

events {
    worker_connections 768;
    # multi_accept on;
}

http {

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    # server_tokens off;

    # server_names_hash_bucket_size 64;
    # server_name_in_redirect off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;
    gzip_disable "msie6";

        gzip_proxied any;
        gzip_types application/x-ghi-packedschemafeatures-v1
        gzip_http_version 1.1;
        gzip_comp_level 1;
        gzip_min_length 500;

        proxy_buffering on;
        proxy_http_version 1.1;

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Virtual Host File:

    server {
      listen 80;
      server_name pipeline.ourdomain.com;
      location / {
        rewrite ^ https://$server_name$request_uri permanent;
      }
}

server {
      listen 443;
      server_name pipeline.ourdomain.com;
      ssl on;
      ssl_protocols SSLv3 TLSv1;
      ssl_ciphers ALL:-ADH:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP;
      ssl_session_cache shared:SSL:10m;
      ssl_certificate     /etc/ssl/certs/ourdomain.com.combined.crt;
      ssl_certificate_key /etc/ssl/private/ourdomain.com.key;

      root /var/www/;
      location /static/ {
               alias /var/www/production/pipeline/public/;
      }

      location / {
             proxy_pass_header Server;
             proxy_set_header Host $http_host;
             proxy_redirect off;
             proxy_set_header X-Real-IP $remote_addr;
             proxy_set_header X-Scheme $scheme;
             proxy_set_header X-Forwarded-Protocol https;
             proxy_connect_timeout 240;
             proxy_read_timeout 280;
             proxy_pass http://localhost:8000/;
      }
      error_page 500 502 503 504 /static/50x.html;



}

Gunicorn Command

#!/bin/bash
set -e
LOGFILE=/var/log/gunicorn/ea_pipeline.log
LOGDIR=$(dirname $LOGFILE)

SETTINGS=production_settings
# user/group to run as
USER=ubuntu
GROUP=ubuntu
DJANGO_PATH=$(dirname $(readlink -f $0))/../
cd $DJANGO_PATH
echo $(pwd)
. ../env/bin/activate
test -d $LOGDIR || mkdir -p $LOGDIR
exec ../env/bin/gunicorn_django \
     --user=$USER --group=$GROUP --log-level=debug \
     --preload \
     --workers=4 \
     --timeout=90 \
     --settings=$SETTINGS \
     --limit-request-line=8190 \
     --limit-request-field_size 0 \
     --pythonpath=$DJANGO_PATH \
     --log-file=$LOGFILE production_settings.py 2>>$LOGFILE

Sample of the Access Log:

67.134.170.194 - - [24/Aug/2012:00:28:17 +0000] "GET /api/v1/storage/ HTTP/1.1" 401 5 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic"
67.134.170.194 - - [24/Aug/2012:00:28:18 +0000] "GET /api/v1/storage/ HTTP/1.1" 200 326 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic"
67.134.170.194 - - [24/Aug/2012:00:28:18 +0000] "GET /api/v1/customer/?client_id=lamb_01 HTTP/1.1" 502 18 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic"
67.134.170.194 - - [24/Aug/2012:00:29:41 +0000] "GET /api/v1/storage/ HTTP/1.1" 502 18 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic"

Nginx Error log:

2012/08/24 00:28:18 [error] 16490#0: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 67.134.170.194, server: pipeline.ourdomain.com, request: "GET /api/v1/customer/?client_id=lamb_01 HTTP/1.1", upstream: "http://127.0.0.1:8000/api/v1/customer/?client_id=lamb_01", host: "pipeline.ourdomain.com"
2012/08/24 00:29:41 [error] 16490#0: *7 connect() failed (111: Connection refused) while connecting to upstream, client: 67.134.170.194, server: pipeline.ourdomain.com, request: "GET /api/v1/storage/ HTTP/1.1", upstream: "http://127.0.0.1:8000/api/v1/storage/", host: "pipeline.ourdomain.com"

Sample of the Gunicorn log:

2012-08-24 17:03:13 [8716] [INFO] Starting gunicorn 0.14.3
2012-08-24 17:03:13 [8716] [DEBUG] Arbiter booted
2012-08-24 17:03:13 [8716] [INFO] Listening at: http://127.0.0.1:8000 (8716)
2012-08-24 17:03:13 [8716] [INFO] Using worker: sync
2012-08-24 17:03:13 [8735] [INFO] Booting worker with pid: 8735
2012-08-24 17:03:13 [8736] [INFO] Booting worker with pid: 8736
2012-08-24 17:03:13 [8737] [INFO] Booting worker with pid: 8737
2012-08-24 17:03:13 [8738] [INFO] Booting worker with pid: 8738
2012-08-24 17:03:21 [8738] [DEBUG] GET /api/v1/storage/
Assertion failed: ok (mailbox.cpp:84)
2012-08-24 17:03:21 [8738] [INFO] Parent changed, shutting down: <Worker 8738>
2012-08-24 17:03:21 [8738] [INFO] Worker exiting (pid: 8738)
Error in sys.exitfunc:
2012-08-24 17:03:21 [8737] [DEBUG] GET /api/v1/storage/
2012-08-24 17:03:22 [8838] [INFO] Starting gunicorn 0.14.3
2012-08-24 17:03:22 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000)
2012-08-24 17:03:22 [8838] [ERROR] Retrying in 1 second.
2012-08-24 17:03:22 [8737] [INFO] Parent changed, shutting down: <Worker 8737>
2012-08-24 17:03:22 [8737] [INFO] Worker exiting (pid: 8737)
Error in sys.exitfunc:
2012-08-24 17:03:22 [8736] [DEBUG] GET /api/v1/customer/
2012-08-24 17:03:23 [8736] [INFO] Parent changed, shutting down: <Worker 8736>
2012-08-24 17:03:23 [8736] [INFO] Worker exiting (pid: 8736)
Error in sys.exitfunc:
2012-08-24 17:03:23 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000)
2012-08-24 17:03:23 [8838] [ERROR] Retrying in 1 second.
2012-08-24 17:03:24 [8735] [DEBUG] GET /api/v1/upload_action/
2012-08-24 17:03:24 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000)
2012-08-24 17:03:24 [8838] [ERROR] Retrying in 1 second.
2012-08-24 17:03:24 [8735] [INFO] Parent changed, shutting down: <Worker 8735>
2012-08-24 17:03:24 [8735] [INFO] Worker exiting (pid: 8735)
Error in sys.exitfunc:
2012-08-24 17:03:25 [8838] [DEBUG] Arbiter booted
2012-08-24 17:03:25 [8838] [INFO] Listening at: http://127.0.0.1:8000 (8838)
2012-08-24 17:03:25 [8838] [INFO] Using worker: sync
2012-08-24 17:03:25 [8907] [INFO] Booting worker with pid: 8907
2012-08-24 17:03:25 [8908] [INFO] Booting worker with pid: 8908
2012-08-24 17:03:25 [8909] [INFO] Booting worker with pid: 8909
2012-08-24 17:03:25 [8910] [INFO] Booting worker with pid: 8910
like image 441
AndrewJesaitis Avatar asked Aug 24 '12 17:08

AndrewJesaitis


1 Answers

This is a very old post. But I had exactly the same problem with a NGinx+Gunicorn+Flask setup. I had as well a 502 error with the same log as yours on every 300th request or so. Changing the gunicorn worker type to an asynchronous solved the problem for me (I picked gthread). Hope this answer will help someone.

How to change the setting: http://docs.gunicorn.org/en/stable/settings.html#worker-class

How to choose your worker type: http://docs.gunicorn.org/en/latest/design.html#choosing-a-worker-type

And here a good explanation why: How many concurrent requests does a single Flask process receive?

like image 118
Si Mon Avatar answered Nov 01 '22 22:11

Si Mon