Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python async and CPU-bound tasks?

I have recently been working on a pet project in python using flask. It is a simple pastebin with server-side syntax highlighting support with pygments. Because this is a costly task, I delegated the syntax highlighting to a celery task queue and in the request handler I'm waiting for it to finish. Needless to say this does no more than alleviate CPU usage to another worker, because waiting for a result still locks the connection to the webserver. Despite my instincts telling me to avoid premature optimization like the plague, I still couldn't help myself from looking into async.

Async

If have been following python web development lately, you surely have seen that async is everywhere. What async does is bringing back cooperative-multitasking, meaning each "thread" decides when and where to yield to another. This non-preemptive process is more efficient than OS-threads, but still has it's drawbacks. At the moment there seem to be 2 major approaches:

  • event/callback style multitasking
  • coroutines

The first one provides concurrency through loosely-coupled components executed in an event loop. Although this is safer with respect to race conditions and provides for more consistency, it is considerably less intuitive and harder to code than preemptive multitasking.

The other one is a more traditional solution, closer to threaded programming style, the programmer only having to manually switch context. Although more prone to race-conditions and deadlocks, it provides an easy drop-in solution.

Most async work at the moment is done on what is known as IO-bound tasks, tasks that block to wait for input or output. This is usually accomplished through the use of polling and timeout based functions that can be called and if they return negatively, context can be switched.

Despite the name, this could be applied to CPU-bound tasks too, which can be delegated to another worker(thread, process, etc) and then non-blockingly waited for to yield. Ideally, these tasks would be written in an async-friendly manner, but realistically this would imply separating code into small enough chunks not to block, preferably without scattering context switches after every line of code. This is especially inconvenient for existing synchronous libraries.


Due to the convenience, I settled on using gevent for async work and was wondering how is to be dealt with CPU-bound tasks in an async environment(using futures, celery, etc?).

How to use async execution models(gevent in this case) with traditional web frameworks such as flask? What are some commonly agreed-upon solutions to these problems in python(futures, task queues)?

EDIT: To be more specific - How to use gevent with flask and how to deal with CPU-bound tasks in this context?

EDIT2: Considering how Python has the GIL which prevents optimal execution of threaded code, this leaves only the multiprocessing option, in my case at least. This means either using concurrent.futures or some other external service dealing with processing(can open the doors for even something language agnostic). What would, in this case, be some popular or often-used solutions with gevent(i.e. celery)? - best practices

like image 469
nikitautiu Avatar asked Apr 12 '13 10:04

nikitautiu


People also ask

Does Python async use multiple cores?

Asynchronous programming is a programming paradigm that enables better concurrency, that is, multiple threads running concurrently. In Python, asyncio module provides this capability. Multiple tasks can run concurrently on a single thread, which is scheduled on a single CPU core.

Is Asyncio faster than threading?

One of the cool advantages of asyncio is that it scales far better than threading . Each task takes far fewer resources and less time to create than a thread, so creating and running more of them works well.

What is a CPU bound task?

In computer science, a computer is CPU-bound (or compute-bound) when the time for it to complete a task is determined principally by the speed of the central processor: processor utilization is high, perhaps at 100% usage for many seconds or minutes.

What are asynchronous tasks Python?

Asynchronous routines are able to “pause” while waiting on their ultimate result and let other routines run in the meantime. Asynchronous code, through the mechanism above, facilitates concurrent execution. To put it differently, asynchronous code gives the look and feel of concurrency.


2 Answers

It should be thread-safe to do something like the following to separate cpu intensive tasks into asynchronous threads:

from threading import Thread

def send_async_email(msg):
    mail.send(msg)

def send_email(subject, sender, recipients, text_body, html_body):
    msg = Message(subject, sender = sender, recipients = recipients)
    msg.body = text_body
    msg.html = html_body
    thr = Thread(target = send_async_email, args = [msg])
    thr.start()

IF you need something more complicated, then perhaps Flask-Celery or Multiprocessing library with "Pool" might be useful to you.

I'm not too familiar with gevent though I can't imagine what more complexity you might need or why.

I mean if you're attempting to have efficiency of a major world-website, then I'd recommend building C++ applications to do your CPU-intensive work, and then use Flask-celery or Pool to run that process. (this is what YouTube does when mixing C++ & Python)

like image 152
Dexter Avatar answered Nov 01 '22 03:11

Dexter


How about simply using ThreadPool and Queue? You can then process your stuff in a seperate thread in a synchronous manner and you won't have to worry about blocking at all. Well, Python is not suited for CPU bound tasks in the first place, so you should also think of spawning subprocesses.

like image 2
freakish Avatar answered Nov 01 '22 03:11

freakish