We have ~300 celeryd processes running under Ubuntu 10.4 64-bit , in idle every process takes ~19mb RES, ~174mb VIRT, thus - it's around 6GB of RAM in idle for all processes. In active state - process takes up to 100mb of RES and ~300mb VIRT
Every process uses minidom(xml files are < 500kb, simple structure) and urllib.
Quetions is - how can we decrease RAM consuption - at least for idle workers, probably some celery or python options may help? How to determine which part takes most of memory?
UPD: thats flight search agents, one worker for one agency/date. We have 10 agencies, one user search == 9 dates, thus we have 10*9 agents per one user search.
Is it possible start celeryd processes on demand to avoid idle workers(something like MaxSpareServers on apache)?
UPD2: Agent lifecycle is - send HTTP request, wait for response ~10-20 sec, parse xml( takes less then 0.02s), save result to MySQL
Read this:
http://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency
It sounds like you have one worker per celeryd. That seems wrong. You should have dozens of workers per celeryd. Keep raising the number of workers (and lowering the number of celeryd's) until your system is very busy and very slow.
S. Lott is right. The main instance consumes messages and delegates them to worker pool processes. There is probably no point in running 300 pool processes on a single machine! Try 4 or 5 multiplied by the number of CPU cores. You may gain something by running more than on celeryd with a few processes each, some people have, but you would have to experiment for your application.
See http://celeryq.org/docs/userguide/workers.html#concurrency
For the upcoming 2.2 release we're working on Eventlet pool support, this may be a good alternative for IO-bound tasks, that will enable you to run 1000+ threads with minimal memory overhead, but it's still experimental and bugs are being fixed for the final release.
See http://groups.google.com/group/celery-users/browse_thread/thread/94fbeccd790e6c04
The upcoming 2.2 release also have support for autoscale, which adds/removes process on demand. See the Changelog: http://ask.github.com/celery/changelog.html#version-2-2-0 (this changelog is not completly written yet)
The natural number of workers is close to the number of cores you have. The workers are there so that cpu-intensive tasks can use an entire core efficiently. The broker is there so that requests that don't have a worker on hand to process them are kept queued. The number of queues can be high, but that doesn't mean you need a high number of brokers either. A single broker should suffice, or you could shard queues to one broker per machine if it later turns out fast worker-queue interaction is beneficial.
Your problem seems unrelated to that. I'm guessing that your agencies don't provide a message queue api, and you have to keep around lots of requests. If so, you need a few (emphasis on not many) evented processes, for example twisted or node.js based.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With