Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing Celery for third party HTTP calls

We are using celery to make third party http calls. We have around 100+ of tasks which simply calls the third party HTTP API calls. Some tasks call the API's in bulk, for example half a million requests at 4 AM in morning, while some are continuous stream of API calls receiving requests almost once or twice per second.

Most of API call response time is between 500 - 800 ms.

We are seeing very slow delivery rates with celery. For most of the above tasks, the max delivery rate is around 100/s (max) to almost 1/s (min). I believe this is very poor and something is definitely wrong, but I am not able to figure out what it is.

We started with cluster of 3 servers and incrementally made it a cluster of 7 servers, but with no improvement. We have tried with different concurrency settings from autoscale to fixed 10, 20, 50, 100 workers. There is no result backend and our broker is RabbitMQ.

Since our task execution time is very small, less than a second for most, we have also tried making prefetch count unlimited to various values.

--time-limit=1800 --maxtasksperchild=1000 -Ofair -c 64 --config=celeryconfig_production

Servers are 64 G RAM, Centos 6.6.

Can you give me idea on what could be wrong or pointers on how to solve it?

Should we go with gevents? Though I have little of idea of what it is.

like image 760
Madhur Ahuja Avatar asked Mar 19 '16 19:03

Madhur Ahuja


1 Answers

First of all - GIL - that should not be a case, since more machines should go faster. But - please check if load goes only on one core of the server...

I'm not sure if whole Celery is good idea in your case. That is great software, with a lot of functionality. But, if that is not needed, it is better to use something simpler - just in case some of that features interfere. I would write small PoC, check other client software, like pika. If that would not help - problem is with infrastructure. If helps - you have solution. :)

It is really hard to tell what is going on. It can be something with IO, or too many network calls... I would step back - to find out something working. Write integration tests, but be sure to use 2-3 machines just to use full tcp stack. Be sure to have CI, and run that tests once a day, or so - to see if things are going in right direction.

like image 73
Michał Zaborowski Avatar answered Sep 23 '22 01:09

Michał Zaborowski