uWSGI python highload configuration

Tags:

We have a big EC2 instance with 32 cores, currently running Nginx, Tornado and Redis, serving on average 5K requests per second. Everything seems to work fine, but the CPU load already reaching 70% and we have to support even more requests. One of the thoughts was to replace Tornado with uWSGI because we don't really use async features of Tornado.

Our application consist from one function, it receives a JSON (~=4KB), doing some blocking but very fast stuff (Redis) and return JSON.

Proxy HTTP request to one of the Tornado instances (Nginx)
Parse HTTP request (Tornado)
Read POST body string (stringified JSON) and convert it to python dictionary (Tornado)
Take data out of Redis (blocking) located on same machine (py-redis with hiredis)
Process the data (python3.4)
Update Redis on same machine (py-redis with hiredis)
Prepare stringified JSON for response (python3.4)
Send response to proxy (Tornado)
Send response to client (Nginx)

We thought the speed improvement will come from uwsgi protocol, we can install Nginx on separate server and proxy all requests to uWSGI with uwsgi protocol. But after trying all possible configurations and changing OS parameters we still can't get it working even on current load. Most of the time nginx log contains 499 and 502 errors. In some configurations it just stopped receiving new requests like it hit some OS limit.

So as I said, we have 32 cores, 60GB free memory and very fast network. We don't do heavy stuff, only very fast blocking operations. What is the best strategy in this case? Processes, Threads, Async? What OS parameters should be set?

Current configuration is:

[uwsgi]
master = 2
processes = 100
socket = /tmp/uwsgi.sock
wsgi-file = app.py
daemonize = /dev/null
pidfile = /tmp/uwsgi.pid
listen = 64000
stats = /tmp/stats.socket
cpu-affinity = 1
max-fd = 20000
memory-report = 1
gevent = 1000
thunder-lock = 1
threads = 100
post-buffering = 1

Nginx config:

user www-data;
worker_processes 10;
pid /run/nginx.pid;

events {
    worker_connections 1024;
    multi_accept on;
    use epoll;
}

OS config:

sysctl net.core.somaxconn
net.core.somaxconn = 64000

I know the limits are too high, started to try every value possible.

UPDATE:

I ended up with the following configuration:

[uwsgi]
chdir = %d
master = 1
processes = %k
socket = /tmp/%c.sock
wsgi-file = app.py
lazy-apps = 1
touch-chain-reload = %dreload
virtualenv = %d.env
daemonize = /dev/null
pidfile = /tmp/%c.pid
listen = 40000
stats = /tmp/stats-%c.socket
cpu-affinity = 1
max-fd = 200000
memory-report = 1
post-buffering = 1
threads = 2

600

asked Apr 06 '15 20:04

offline15

1 Answers

I think your request handling roughly breaks down as follows:

HTTP parsing, request routing, JSON parsing
execute some python code which yields a redis request
(blocking) redis request
execute some python code which processes the redis response
JSON serialization, HTTP response serialization

You could benchmark the handling time on a near-idle system. My hunch is that the round trip would boil down to 2 or 3 milliseconds. At 70% CPU load this would go up to about 4 or 5 ms (not counting time spent in nginx request queue, just the handling in uWSGI worker).

At 5k req/s your average in-process request could would be in the 20 ... 25 range. A decent match to your VM.

Next step is to balance the CPU cores. If you have 32 cores, it does not make sense to allocate 1000 worker processes. You might end up chocking the system on context switching overhead. A good balancing will have the total amount of workers (nginx+uWSGI+redis) in the order of magnitude as the available CPU cores, maybe with a little extra to cover for blocking I/O (i.e. filesystem, but mainly networked requests being done to other hosts like a DBMS). If blocking I/O becomes a big part of the equation, consider rewriting into asynchronous code and integrating an async stack.

First observation: you're allocating 10 workers to nginx. However the CPU time nginx spends on a request is MUCH lower than the time uWSGI spends on it. I would start by dedicating about 10% of the system to nginx (3 or 4 worker processes).

The remainder would have to be split between uWSGI and redis. I don't know about the size of your indices in redis, or about the complexity of your python code, but my first attempt would be a 75%/25% split between uWSGI and redis. That would put redis on about 6 workers and uWSGI on about 20 workers + a master.

As for the threads option in uwsgi configuration: thread switching is lighter than process switching, but if a significant part of your python code is CPU-bound it won't fly because of GIL. Threads option is mainly interesting if a significant part of your handling time is I/O blocked. You could disable threads, or try with workers=10, threads=2 as an initial attempt.

145

answered Nov 15 '22 20:11

Freek Wiekmeijer

Related questions
                            
                                Matplotlib: pcolor() does not plot last row and column?
                            
                                Deploy Flask app as windows service
                            
                                How to get excel sheet name in Python using xlrd
                            
                                Python converting datetime to be used in os.utime
                            
                                Celery: correct way to run lengthy initialization function (per process)
                            
                                how to add_argument_group to add_mutually_exclusive_group with python argparse
                            
                                How plot datetime.time in matplotlib?
                            
                                Are there any built-in functions which block on I/O that don't allow other threads to run?
                            
                                How to Convert pythons Decimal() type into an INT and exponent
                            
                                Can Python be configured to cache sys.path directory lookups?
                            
                                Django DB level default value for a column
                            
                                Can one switch between output_notebook and output_file within an IPython notebook session with Bokeh?
                            
                                Sphinx coverage generates empty python.txt file
                            
                                Django authenticate using logged in windows domain user
                            
                                bounded circular interpolation in python
                            
                                Django 1.7 makemigrations freezing/hanging
                            
                                Having trouble installing pycurl on windows
                            
                                TypeError: 'cmp' is an invalid keyword argument for this function
                            
                                Swiss tournament - pairing algorithm
                            
                                Getting the confidence level of detectMultiscale in OpenCV with Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

uWSGI python highload configuration

Tags:

python

nginx

high-load

uwsgi

tornado

offline15

People also ask

1 Answers

Freek Wiekmeijer

Recent Activity

Donate For Us