I'm having a bit of trouble deciding whatever to use python multiprocessing or celery or pp for my application.
My app is very CPU heavy but currently uses only one cpu so, I need to spread it across all available cpus(which caused me to look at python's multiprocessing library) but I read that this library doesn't scale to other machines if required. Right now I'm not sure if I'll need more than one server to run my code but I'm thinking of running celery locally and then scaling would only require adding new servers instead of refactoring the code(as it would if I used multiprocessing).
My question: is this logic correct? and is there any negative(performance) with using celery locally(if it turns out a single server with multiple cores can complete my task)? or is it more advised to use multiprocessing and grow out of it into something else later?
Thanks!
p.s. this is for a personal learning project but I would maybe one day like to work as a developer in a firm and want to learn how professionals do it.
Celery itself is using billiard (a multiprocessing fork) to run your tasks in separate processes.
Celery is a task queue implementation for Python web applications used to asynchronously execute work outside the HTTP request-response cycle. Celery is an implementation of the task queue concept.
Celery allows Python applications to quickly implement task queues for many workers. It takes care of the hard part of receiving tasks and assigning them appropriately to workers. You use Celery to accomplish a few main goals: Define independent tasks that your workers can do as a Python function.
Celery is an asynchronous task queue framework written in Python. Celery makes it easy to execute background tasks but also provides tools for parallel execution and task coordination.
I just finished a test to decide how much celery adds as overhead over multiprocessing.Pool
and shared arrays. The test runs the wiener filter on a (292, 353, 1652) uint16 array. Both versions use the same chunking (roughly:divide the 292,353 dimensions by the square root of the number of available cpu's). Two celery versions were tried: one solution sends pickled data the other opens the underlying data file in every worker.
Result: on my 16 core i7 CPU celery takes about 16s, multiprocessing.Pool
with shared arrays about 15s. I find this difference surprisingly small.
Increasing granularity increases the difference obviously (celery has to pass more messages): celery takes 15 s, multiprocessing.Pool
takes 12s.
Take into account that celery workers were already running on the host whereas the pool workers are forked at each run. I am not sure how could I start multiprocessing pool at the beginning since I pass the shared arrays in the initializer:
with closing(Pool(processes=mp.cpu_count(), initializer=poolinit_gen, initargs=(sourcearrays, resarrays))) as p:
and only the resarrays are protected by locking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With