Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreading within a Celery Worker

I am using Celery with RabbitMQ to process data from API requests. The process goes as follows:

Request > API > RabbitMQ > Celery Worker > Return

Ideally I would spawn more celery workers but I am restricted by memory constraints.

Currently, the bottleneck in my process is fetching and downloading the data from the URLs passed into the worker. Roughy, the process looks like this:

def celery_gets_job(url):
    data = fetches_url(url)       # takes 0.1s to 1.0s (bottleneck)
    result = processes_data(data) # takes 0.1s
    return result

This is unacceptable as the worker is locked up for a while while fetching the URL. I am looking at improving this through threading, but I am unsure what the best practices are.

  • Is there a way to make the celery worker download the incoming data asynchronously while processing the data at the same time in a different thread?

  • Should I have separate workers fetching and processing, with some form of message passing, possibly via RabbitMQ?

like image 891
Dominic Cabral Avatar asked Nov 18 '16 17:11

Dominic Cabral


People also ask

Is celery multi threaded?

Most Django projects do not need to worry about using threads/multithreading on an application level. So Celery (and other queue frameworks) has other benefits as well - Think of it as a 'task/function manager' rather then just a way of multithreading.

Does celery use multiprocessing?

Celery itself is using billiard (a multiprocessing fork) to run your tasks in separate processes.

Do celery workers share memory?

Synchronous workers do not share memory space at all, they are completely separate processes. Your requirement is to add state to each worker, and naturally that is something that isn't easy to do, neither is it scalable.

Do celery tasks run in parallel?

Learn how to easily parallelize tasks in Python for a performance boost. Celery is an asynchronous task queue framework written in Python. Celery makes it easy to execute background tasks but also provides tools for parallel execution and task coordination.


1 Answers

Using the eventlet library, you can patch the standard libraries for making them asynchronous.

First import the async urllib2:

from eventlet.green import urllib2

So you will get the url body with:

def fetch(url):
    body = urllib2.urlopen(url).read()
    return body

See more eventlet examples here.

like image 79
otorrillas Avatar answered Oct 03 '22 21:10

otorrillas