Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gevent: downside to spawning large number of greenlets?

Following on from my question in the comment to this answer to the question "Gevent pool with nested web requests":

Assuming one has a large number of tasks, is there any downside to using gevent.spawn(...) to spawn all of them simultaneously rather than using a gevent pool and pool.spawn(...) to limit the number of concurrent greenlets?

Formulated differently: is there any advantage to "limiting concurrency" with a gevent.Pool even if not required by the problem to be solved?

Any idea what would constitute a "large number" for this issue?

like image 986
ARF Avatar asked Nov 04 '13 16:11

ARF


1 Answers

It's just cleaner and a good practice when dealing with a lot of stuff. I ran into this a few weeks ago I was using gevent spawn to verify a bunch of emails against DNS on the order of 30k :).

from gevent.pool import Pool
import logging
rows = [ ... a large list of stuff ...]
CONCURRENCY = 200 # run 200 greenlets at once or whatever you want
pool = Pool(CONCURRENCY)
count = 0

def do_work_function(param1,param2):
   print param1 + param2

for row in rows:
  count += 1 # for logging purposes to track progress
  logging.info(count)
  pool.spawn(do_work_function,param1,param2) # blocks here when pool size == CONCURRENCY

pool.join() #blocks here until the last 200 are complete

I found in my testing that when CONCURRENCY was around 200 is when my machine load would hover around 1 on a EC2 m1.small. I did it a little naively though, if I were to do it again I'd run multiple pools and sleep some time in between them to try to distribute the load on the NIC and CPU more evenly.

One last thing to keep in mind is keeping an eye on your open files and increasing that if need be: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files. The greenlets I was running were taking up around 5 file descriptors per greenlet so you can run out pretty quickly if you aren't careful. This may not be helpful if your system load is above one as you'll start seeing diminishing returns regardless.

like image 178
Jordan Shaw Avatar answered Oct 05 '22 14:10

Jordan Shaw