Is Celery as efficient on a local system as python multiprocessing is?

Tags:

I'm having a bit of trouble deciding whatever to use python multiprocessing or celery or pp for my application.

My app is very CPU heavy but currently uses only one cpu so, I need to spread it across all available cpus(which caused me to look at python's multiprocessing library) but I read that this library doesn't scale to other machines if required. Right now I'm not sure if I'll need more than one server to run my code but I'm thinking of running celery locally and then scaling would only require adding new servers instead of refactoring the code(as it would if I used multiprocessing).

My question: is this logic correct? and is there any negative(performance) with using celery locally(if it turns out a single server with multiple cores can complete my task)? or is it more advised to use multiprocessing and grow out of it into something else later?

Thanks!

p.s. this is for a personal learning project but I would maybe one day like to work as a developer in a firm and want to learn how professionals do it.

270

asked Feb 12 '12 01:02

Lostsoul

1 Answers

I just finished a test to decide how much celery adds as overhead over multiprocessing.Pool and shared arrays. The test runs the wiener filter on a (292, 353, 1652) uint16 array. Both versions use the same chunking (roughly:divide the 292,353 dimensions by the square root of the number of available cpu's). Two celery versions were tried: one solution sends pickled data the other opens the underlying data file in every worker.

Result: on my 16 core i7 CPU celery takes about 16s, multiprocessing.Pool with shared arrays about 15s. I find this difference surprisingly small.

Increasing granularity increases the difference obviously (celery has to pass more messages): celery takes 15 s, multiprocessing.Pool takes 12s.

Take into account that celery workers were already running on the host whereas the pool workers are forked at each run. I am not sure how could I start multiprocessing pool at the beginning since I pass the shared arrays in the initializer:

with closing(Pool(processes=mp.cpu_count(), initializer=poolinit_gen, initargs=(sourcearrays, resarrays))) as p:

and only the resarrays are protected by locking.

125

answered Sep 19 '22 16:09

Dr. Hillier Dániel

Related questions
                            
                                Keeping NaN values and dropping nonmissing values
                            
                                How to convert a 16 bit to an 8 bit image in OpenCV?
                            
                                Python: yield and yield assignment
                            
                                Installing anaconda over existing python system?
                            
                                How to properly mask a numpy 2D array?
                            
                                Querying with function on Flask-SQLAlchemy model gives BaseQuery object is not callable error
                            
                                How to get the latest frame from capture device (camera) in opencv
                            
                                How do I specify a range of unicode characters
                            
                                Python: Yield Dict Elements in generators?
                            
                                Python or awk/sed for cleaning data [closed]
                            
                                how to implement nested item in scrapy?
                            
                                Interpreting GPS info of exif data from photo in python
                            
                                How to sharex when using subplot2grid
                            
                                How do I test whether an nltk resource is already installed on the machine running my code?
                            
                                Reverse Box-Cox transformation
                            
                                Create a self signed X509 certificate in Python
                            
                                Can't write and save a video file using OpenCV and Python
                            
                                When global_variables_initializer() is actually required
                            
                                Why is '\x' invalid in Python?
                            
                                Best way to integrate Erlang and python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is Celery as efficient on a local system as python multiprocessing is?

Tags:

python

parallel-processing

multiprocessing

celery

Lostsoul

People also ask

1 Answers

Dr. Hillier Dániel

Recent Activity

Donate For Us