Bad gateway errors at load on nginx + Unicorn (Rails 3 app)

Tags:

I have an Rails (3.2) app that runs on nginx and unicorn on a cloud platform. The "box" is running on Ubuntu 12.04.

When the system load is at about 70% or above, nginx abruptly (and seemingly randomly) starts throwing 502 Bad gateway errors; when load is less there's nothing like it. I have experimented with various number of cores (4, 6, 10 - I can "change hardware" as it's on cloud platform), and the situation is always the same. (CPU load is similar to system load, userland is say 55%, the rest being system and stolen, with plenty of free memory, no swapping.)

502's usually come in batches but not always.

(I run one unicorn worker per core, and one or two nginx workers. See the relevant parts of the configs below when running on 10 cores.)

I don't really know how to track the cause of these errors. I suspect that it may have something to do with unicorn workers not being able to serve (in time?) but it looks odd because they do not seem to saturate the CPU and I see no reason why they would wait for IO (but I don't know how to make sure of that either).

Can you, please, help me with how to go about finding the cause?

Unicorn config (unicorn.rb):

Click to copy

worker_processes 10
working_directory "/var/www/app/current"
listen "/var/www/app/current/tmp/sockets/unicorn.sock", :backlog => 64
listen 2007, :tcp_nopush => true
timeout 90
pid "/var/www/app/current/tmp/pids/unicorn.pid"
stderr_path "/var/www/app/shared/log/unicorn.stderr.log"
stdout_path "/var/www/app/shared/log/unicorn.stdout.log"
preload_app true
GC.respond_to?(:copy_on_write_friendly=) and
  GC.copy_on_write_friendly = true
check_client_connection false

before_fork do |server, worker|
  ... I believe the stuff here is irrelevant ...
end
after_fork do |server, worker|
  ... I believe the stuff here is irrelevant ...
end

And the ngnix config:

/etc/nginx/nginx.conf:

Click to copy

worker_processes 2;
worker_rlimit_nofile 2048;
user www-data www-admin;
pid /var/run/nginx.pid;
error_log /var/log/nginx/nginx.error.log info;

events {
  worker_connections 2048;
  accept_mutex on; # "on" if nginx worker_processes > 1
  use epoll;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  /var/log/nginx/access.log  main;
    # optimialization efforts
    client_max_body_size        2m;
    client_body_buffer_size     128k;
    client_header_buffer_size   4k;
    large_client_header_buffers 10 4k;  # one for each core or one for each unicorn worker?
    client_body_temp_path       /tmp/nginx/client_body_temp;

    include /etc/nginx/conf.d/*.conf;
}

/etc/nginx/conf.d/app.conf:

Click to copy

sendfile on;
tcp_nopush on;
tcp_nodelay off;
gzip on;
gzip_http_version 1.0;
gzip_proxied any;
gzip_min_length 500;
gzip_disable "MSIE [1-6]\.";
gzip_types text/plain text/css text/javascript application/x-javascript;

upstream app_server {
  # fail_timeout=0 means we always retry an upstream even if it failed
  # to return a good HTTP response (in case the Unicorn master nukes a
  # single worker for timing out).
  server unix:/var/www/app/current/tmp/sockets/unicorn.sock fail_timeout=0;
}

server {
  listen 80 default deferred;
  server_name _;
  client_max_body_size 1G;
  keepalive_timeout 5;
  root /var/www/app/current/public;

  location ~ "^/assets/.*" {
      ...
  }

  # Prefer to serve static files directly from nginx to avoid unnecessary
  # data copies from the application server.
  try_files $uri/index.html $uri.html $uri @app;

  location @app {
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_redirect off;

    proxy_pass http://app_server;

    proxy_connect_timeout      90;
    proxy_send_timeout         90;
    proxy_read_timeout         90;

    proxy_buffer_size          128k;
    proxy_buffers              10 256k;  # one per core or one per unicorn worker?
    proxy_busy_buffers_size    256k;
    proxy_temp_file_write_size 256k;
    proxy_max_temp_file_size   512k;
    proxy_temp_path            /mnt/data/tmp/nginx/proxy_temp;

    open_file_cache max=1000 inactive=20s; 
    open_file_cache_valid    30s; 
    open_file_cache_min_uses 2;
    open_file_cache_errors   on;
  }
}

288

asked Mar 18 '13 13:03

fastcatch

1 Answers

After googling for expressions found in the nginx error log it turned out to be a known issue which has nothing to do with nginx, little to do with unicorn and is rooted in OS (linux) settings.

The core of the problem is that the socket backlog is too short. There are various considerations how much this should be (whether you want to detect cluster member failure ASAP or keep the application push the load limits). But in any case the listen :backlog has needs tweaking.

I found that in my case a listen ... :backlog => 2048 was sufficient. (I did not experiment much, though there's a good hack to do it if you like, by having two sockets to communicate between nginx and unicorn with different backlogs and the longer being backup; then see in the nginx log how often the shorter queue fails.) Please note that it's not a the result of a scientific calculation and YMMV.

Note, however, the many OS-es (most linux distros, Ubuntu 12.04 included) have much lower OS level default limits on socket backlog sizes (as low as 128).

You can change the OS limits as follows (being root):

Click to copy

sysctl -w net.core.somaxconn=2048
sysctl -w net.core.netdev_max_backlog=2048

Add these to /etc/sysctl.conf to make the changes permanent. (/etc/sysctl.conf can be reloaded without rebooting with sysctl -p.)

There are mentions that you may have to increase the maximum number of files that can be opened by a process also (use ulimit -n and /etc/security/limits.conf for permanency). I had already done that for other reasons so I cannot tell if it makes a difference or not.

answered Oct 25 '22 19:10

fastcatch

Related questions
                            
                                Regex to validate string having only characters (not special characters), blank spaces and numbers
                            
                                How to Skip Validations w/ find_or_create_by_?
                            
                                uglifier not working in rails
                            
                                Set a session variable in devise on sign in
                            
                                Rails 3/Ruby 1.9.2 Date.tomorrow not correct
                            
                                Virtual attributes and mass-assignment
                            
                                Rails I18n of CSS file
                            
                                Base64 upload from Android/Java to RoR Carrierwave
                            
                                How to properly add brackets to SQL queries with 'or' and 'and' clauses by using Arel?
                            
                                rake aborted! undefined method `map' for :name:Symbol
                            
                                How do I ignore folders and files when pushing to Heroku with a Rails app?
                            
                                how to use content_for so that the content shows up in the layout
                            
                                How to install Rails 3 master from GitHub
                            
                                Rails3 button_to is calling POST action, trying to call PUT action
                            
                                What is the best way to test a rails app?
                            
                                Empty my Sqlite3 database in RoR
                            
                                In what circumstances should I use instance variables instead of other variable types?
                            
                                How to dynamically load class using namespaces/subdirectory in Ruby/Rails?
                            
                                How to order by a related model in default scope? -- Rails 3.1
                            
                                How to render a RAILS partial with local variable f in new.js.erb?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bad gateway errors at load on nginx + Unicorn (Rails 3 app)

Tags:

nginx

ruby-on-rails-3

unicorn

fastcatch

People also ask

1 Answers

fastcatch

Recent Activity

Donate For Us