I'm starting to venture into distributed code and am having trouble figuring out which solution fits my needs based on all the stuff out there. Basically I have a python list of data that I need to process with a single function. This function has a few nested for loops but doesn't take too long(about a min) for each item on the list. My problem is the list is very large(3000+ items). I'm looking at multiprocessing but I think I want to experiment with multi-server processing it(because ideally, if the data gets larger I want to be able to have the choice of adding more servers during the job to make it run quicker).
I basically looking for something that I can distribute this data list through(and not super needed but it would be nice if I could distribute my code base through this also)
So my question is, what package can I use to achieve this? My database is hbase so I already have hadoop running(never used hadoop though, just using it for the database). I looked at celery and twisted as well but I'm confused on which will fit my needs.
Any suggestions?
I would highly recommend celery. You can define a task that operates on a single item of your list:
from celery.task import task
@task
def process(i):
# do something with i
i += 1
# return a result
return i
You can easily parallelize a list like this:
results = []
todo = [1,2,3,4,5]
for arg in todo:
res = process.apply_async(args=(arg))
results.append(res)
all_results = [res.get() for res in results]
This is easily scalable by just adding more celery workers.
check out rabbitMQ. Python bindings are available through pika. start with a simple work_queue and run few rpc calls.
It may look troublesome to experiment distributed computing in python with an external engine like rabbitMQ (there's a small learning curve for installing and configuring the rabbit) but you may find it even more useful later.
... and celery can work hand-in-hand with rabbitMQ, checkout robert pogorzelski's tutorial and Simple distributed tasks with Celery and RabbitMQ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With