Multithreading within a Celery Worker

Tags:

I am using Celery with RabbitMQ to process data from API requests. The process goes as follows:

Request > API > RabbitMQ > Celery Worker > Return

Ideally I would spawn more celery workers but I am restricted by memory constraints.

Currently, the bottleneck in my process is fetching and downloading the data from the URLs passed into the worker. Roughy, the process looks like this:

Click to copy

def celery_gets_job(url):
    data = fetches_url(url)       # takes 0.1s to 1.0s (bottleneck)
    result = processes_data(data) # takes 0.1s
    return result

This is unacceptable as the worker is locked up for a while while fetching the URL. I am looking at improving this through threading, but I am unsure what the best practices are.

Is there a way to make the celery worker download the incoming data asynchronously while processing the data at the same time in a different thread?
Should I have separate workers fetching and processing, with some form of message passing, possibly via RabbitMQ?

891

asked Nov 18 '16 17:11

Dominic Cabral

1 Answers

Using the eventlet library, you can patch the standard libraries for making them asynchronous.

First import the async urllib2:

Click to copy

from eventlet.green import urllib2

So you will get the url body with:

Click to copy

def fetch(url):
    body = urllib2.urlopen(url).read()
    return body

See more eventlet examples here.

answered Oct 03 '22 21:10

otorrillas

Related questions
                            
                                How to pop rows from a dataframe?
                            
                                Tensorflow - casting from int to float strange behavior
                            
                                PyCharm: Is there a way to make the "data view" auto update when dataframe is changed?
                            
                                Running Python Script from Android Activity
                            
                                Combine compiled Python regexes
                            
                                Pandas Dataframe groupby describe 8x ~slower than computing separatly
                            
                                Why cv2.rectangle sometimes return np.ndarray, while sometimes cv2.UMat
                            
                                Matplotlib window appears at the back?
                            
                                Initializing bluetooth connection android(client) to python(server) on pc
                            
                                What latency is required to make software fax using only speaker and mic?
                            
                                Python: garbage collector behavior with ctypes
                            
                                How to speed up code to solve bit deletion puzzle
                            
                                Best way to package a Python library that includes a C shared library?
                            
                                How to store python objects in Cython C++ containers?
                            
                                Socket won't bind: no such device
                            
                                How to ensure task execution order per user using Celery, RabbitMQ and Django?
                            
                                How to close the existing browser tab using the Python webbrowser package
                            
                                How to handle meta data associated with a pandas dataframe?
                            
                                typing module - String Literal Type [duplicate]
                            
                                How to include git dependencies in setup.py for pip installation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multithreading within a Celery Worker

Tags:

python

multithreading

rabbitmq

api-design

celery

Dominic Cabral

People also ask

1 Answers

otorrillas

Recent Activity

Donate For Us