Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery tasks profiling

As I can see in top utility celery procecess consume a lot of CPU time. So I want to profile it.

I can do it manually on developer machine like so:

python -m cProfile -o test-`date +%Y-%m-%d-%T`.prof ./manage.py celeryd -B

But to have accurate timings I need to profile it on production machine. On that machine (Fedora 14) celery is launched by init scripts. E.g.

service celeryd start

I have figured out these scripts eventually call manage.py celeryd_multi eventually. So my question is how can I tell celeryd_multi to start celery with profiling enabled? In my case this means add -m cProfile -o out.prof options to python.

Any help is much appreciated.

like image 500
z4y4ts Avatar asked Sep 19 '11 15:09

z4y4ts


People also ask

How does Celery execute tasks?

Celery workers are worker processes that run tasks independently from one another and outside the context of your main service. Celery beat is a scheduler that orchestrates when to run tasks. You can use it to schedule periodic tasks as well.

How do you monitor a Celery worker?

Now, open yet another terminal and enter command cd examples/queue-based-distribution/ and then enter celery -A celery-task-queue status as the basic monitoring command is status , which returns the state of the workers. The result of celery -A celery-task-queue status will be the status and the number of nodes online.

How many tasks can Celery handle?

celery beats only trigger those 1000 tasks (by the crontab schedule), not run them. If you want to run 1000 tasks in parallel, you should have enough celery workers available to run those tasks.


1 Answers

I think you're confusing two separate issues. You could be processing too many individual tasks or an individual task could be inefficient.

You may know which of these is the problem, but it's not clear from your question which it is.

To track how many tasks are being processed I suggest you look at celerymon. If a particular task appears more often that you would expect then you can investigate where it is getting called from.

Profiling the whole of celery is probably not helpful as you'll get lots of code that you have no control over. As you say it also means you have a problem running it in production. I suggest you look at adding the profiling code directly into your task definition.

You can use cProfile.run('func()') as a layer of indirection between celery and your code so each run of the task is profiled. If you generate a unique filename and pass it as the second parameter to run you'll have a directory full of profile data that you can inspect on a task-by-task basis, or use pstats.add to combine multiple task runs together.

Finally, per-task profiling means you can also turn profiling on or off using a setting in your project code either globally or by task, rather than needing to modify the init scripts on your server.

like image 186
Andrew Wilkinson Avatar answered Oct 03 '22 07:10

Andrew Wilkinson