Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gevent block redis' socket request

GOAL:spawn a few greenlet worker deal with the data pop from redis (pop from redis and then put into queue)

RUNNING ENV: ubuntu 12.04 PYTHON VER: 2.7 GEVENT VER: 1.0 RC2 REDIS VER:2.6.5 REDIS-PY VER:2.7.1

from gevent import monkey; monkey.patch_all()
import gevent
from gevent.pool import Group
from gevent.queue import JoinableQueue
import redis

tasks = JoinableQueue()
task_group = Group()

def crawler():
    while True:
        if not tasks.empty():
            print tasks.get()
            gevent.sleep()

task_group.spawn(crawler)
redis_client = redis.Redis()
data = redis_client.lpop('test') #<----------Block here
tasks.put(data)

Try to pop data from redis, but it blocked..and no exception raised...just freeze and remove spawn method ,it will worked.. i feel confuse what happened, plz help! thk u!

like image 682
XIO Avatar asked Dec 29 '12 03:12

XIO


1 Answers

gevent provides cooperative lightweight processes (not threads). The consequence is when you have an infinite loop somewhere and the scheduler is never reentered, the program will block taking 100% of a CPU core.

In your example, the problem is the way you have defined the crawler loop. Obviously, you have an infinite loop when tasks is empty. And because the gevent.sleep call (that would perform the necessary yield operation) is only called when tasks is not empty, it means the scheduler is never reentered.

It seems to block on the lpop command because the connection is delayed by the Redis client. The sequence of events is as follows:

  • the task group is spawned; but no greenlet is scheduled yet
  • redis_client is built, but it does not generate an I/O yet since the actual connection is delayed
  • lpop is called; this time the connection is really needed because the Redis client has to wait for the connection and the reply to lpop; it therefore yields to the scheduler
  • the scheduler activates a crawler worker
  • infinite loop, since the tasks queue is still empty

If you put the gevent.sleep() in the loop itself (after the if), it will work better, but it is still an inefficient way to implement a dequeuer. Something like this would be much better:

def crawler():
    while True:
        x = tasks.get()
        try:
            print "Crawler: ",x
        finally:
            tasks.task_done()

The get() call is blocking the worker, so it will avoid the ping pong game between the worker and the scheduler while the queue is empty.

like image 62
Didier Spezia Avatar answered Nov 14 '22 21:11

Didier Spezia