GOAL:spawn a few greenlet worker deal with the data pop from redis (pop from redis and then put into queue)
RUNNING ENV: ubuntu 12.04 PYTHON VER: 2.7 GEVENT VER: 1.0 RC2 REDIS VER:2.6.5 REDIS-PY VER:2.7.1
from gevent import monkey; monkey.patch_all()
import gevent
from gevent.pool import Group
from gevent.queue import JoinableQueue
import redis
tasks = JoinableQueue()
task_group = Group()
def crawler():
while True:
if not tasks.empty():
print tasks.get()
gevent.sleep()
task_group.spawn(crawler)
redis_client = redis.Redis()
data = redis_client.lpop('test') #<----------Block here
tasks.put(data)
Try to pop data from redis, but it blocked..and no exception raised...just freeze and remove spawn method ,it will worked.. i feel confuse what happened, plz help! thk u!
gevent provides cooperative lightweight processes (not threads). The consequence is when you have an infinite loop somewhere and the scheduler is never reentered, the program will block taking 100% of a CPU core.
In your example, the problem is the way you have defined the crawler loop. Obviously, you have an infinite loop when tasks is empty. And because the gevent.sleep call (that would perform the necessary yield operation) is only called when tasks is not empty, it means the scheduler is never reentered.
It seems to block on the lpop command because the connection is delayed by the Redis client. The sequence of events is as follows:
If you put the gevent.sleep() in the loop itself (after the if), it will work better, but it is still an inefficient way to implement a dequeuer. Something like this would be much better:
def crawler():
while True:
x = tasks.get()
try:
print "Crawler: ",x
finally:
tasks.task_done()
The get() call is blocking the worker, so it will avoid the ping pong game between the worker and the scheduler while the queue is empty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With