Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flask and/or Tornado - handling time consuming call to external webservice

I've got a flask app that connects with given URL to external services (with different, but usually long response times) and searches for some stuff there. After that there's some CPU heavy operations on the retrieved data. This take some time too.

My problem: response from external may take some time. You can't do much about it, but it becomes a big problem when you have multiple requests at once - flask request to external service blocks the thread and the rest is waiting.

Obvious waste of time and it's killing the app.

I heard about this asynchonous library called Tornado. And there are my questions:

  1. Does that mean it can manage to handle multiple reqests and just trigger callback right after response from external?
  2. Can I achieve that with my current flask app (probably not because of WSGI I guess?) or maybe I need to rewrite the whole app to Tornado?
  3. What about those CPU heavy operations - would that block my thread? It's a good idea to do some load balancing anyway, but I'm curious how Tornado handles that.
  4. Possible traps, gotchas?
like image 380
Jankiel Avatar asked Sep 30 '14 22:09

Jankiel


People also ask

How to do background tasks in tornado with flask?

In general, Tornado supports background tasks more natively, but you can also make this happen in Flask. Just like the previous two scenarios, we can again use threading to achieve this in Flask. It should be noted that the start of the background job should be done before the start of Flask app.

Why do developers like flask and Tornado so much?

"Lightweight" is the top reason why over 261 developers like Flask, while over 34 developers mention "Open source" as the leading cause for choosing Tornado. Flask and Tornado are both open source tools.

What is the difference between Flask and tornado?

Flask and Tornado are both popular web frameworks in the Python world with different intentions. Flask is a lightweight web framework based on WSGI and Tornado is a web framework and also an asynchronous networking library. How to handle concurrent requests? Such a different setup also means that they will handle concurrent requests differently.

What is fallback request handler in tornado?

A RequestHandler that wraps another HTTP server callback. The fallback is a callable object that accepts an HTTPServerRequest, such as an Application or tornado.wsgi.WSGIContainer. This is most useful to use both Tornado RequestHandlers and WSGI in the same server.


2 Answers

The web server built into flask isn't meant to be used in production, for exactly the reasons you're listing - it's single threaded, and easily bogged down if any request blocking for a non-trivial amount of time. The flask documentation lists several options for deploying it in a production environment; mod_wsgi, gunicorn, uSWGI, etc. All of those deployment options provides mechanisms for handling concurrency, either via threads, processes, or non-blocking I/O. Note, though, that if you're doing CPU-bound operations, the only option that will give true concurrency is to use multiple processes.

If you want to use tornado, you'll need to rewrite your application in the tornado style. Because its architecture based on explicit asynchronous I/O, you can't use its asynchronous features if you deploy it as a WSGI application. The "tornado style" basically means using non-blocking APIs for all I/O operations, and using sub-processes for handling any long-running CPU-bound operations. The tornado documentation covers how to make asynchronous I/O calls, but here's a basic example of how it works:

from tornado import gen

@gen.coroutine
def fetch_coroutine(url):
    http_client = AsyncHTTPClient()
    response = yield http_client.fetch(url)
    return response.body

The response = yield http_client.fetch(curl) call is actually asynchronous; it will return control to the tornado event loop when the requests begins, and will resume again once the response is received. This allows multiple asynchronous HTTP requests to run concurrently, all within one thread. Do note though, that anything you do inside of fetch_coroutine that isn't asynchronous I/O will block the event loop, and no other requests can be handled while that code is running.

To deal with long-running CPU-bound operations, you need to send the work to a subprocess to avoid blocking the event loop. For Python, that generally means using either multiprocessing or concurrent.futures. I'd take a look at this question for more information on how best to integrate those libraries with tornado. Do note that you won't want to maintain a process pool larger than the number of CPUs you have on the system, so consider how many concurrent CPU-bound operations you expect to be running at any given time when you're figuring out how to scale this beyond a single machine.

The tornado documentation has a section dedicated to running behind a load balancer, as well. They recommend using NGINX for this purpose.

like image 166
dano Avatar answered Feb 15 '23 20:02

dano


Tornado seems more fit for this task than Flask. A subclass of Tornado.web.RequestHandler run in an instance of tornado.ioloop should give you non blocking request handling. I expect it would look something like this.

import tornado
import tornado.web
import tornado.ioloop
import json

class handler(tornado.web.RequestHandler):
    def post(self):
        self.write(json.dumps({'aaa':'bbbbb'}))


if __name__ == '__main__':
    app = tornado.web.Application([('/', handler)])
    app.listen(80, address='0.0.0.0')
    loop = tornado.ioloop.IOLoop.instance()
    loop.start()

if you want your post handler to be asynchronous you could decorate it with tornado.gen.coroutine with 'AsyncHTTPClientorgrequests`. This will give you non blocking requests. you could potentially put your calculations in a coroutine as well, though I'm not entirely sure.

like image 39
ragingSloth Avatar answered Feb 15 '23 20:02

ragingSloth