Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery concurrency configuration for io/cpu bound task

I have tasks that need to load a few large files from the internet and then do some processing. Running synchronously the loading would take ~3s and the processing ~0.2s. Although the processing is much faster than loading, it still takes a considerable amount of time.

I wonder what would be the best celery configuration to handle my scenario. Multi-processing, Eventlet, or maybe something else?

like image 644
Xyand Avatar asked Oct 04 '22 07:10

Xyand


1 Answers

This question seems to me it needs an answer comparing multi-process/thread versus green threads; however, generally speaking in the context of celery concurrency, in doesn't make a difference using either, unless you have limited resources (and too many tasks), or you are making too many outbound connections and you are i/o-bound, then you will have to go "green" and use eventlet.

A good idea, I have seen Instagram presenting in last PyCon 2013 (Messaging at Scale at Instagram), they use both. The main usage is the threaded tasks, yet, they use the "green" approach with those tasks that does nothing but doing outbound request to other websites, like twitter, facebook, and tumbler .. those type of tasks don't deserve a complete thread/process, as no real processing happening, moveover, the request/respones cycle takes some time, so, the best thing to do with those tasks is to make them green.

You can create worker(s) that use thread/process per task, that only process tasks through specific queues, and another worker(s) that uses greenlets that only process other tasks through other queues. Then you can decide which goes where as per the above explanation!

like image 200
securecurve Avatar answered Oct 12 '22 22:10

securecurve