Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gevent/Eventlet monkey patching for DB drivers

After doing Gevent/Eventlet monkey patching - can I assume that whenever DB driver (eg redis-py, pymongo) uses IO through standard library (eg socket) it will be asynchronous?

So using eventlets monkey patching is enough to make eg: redis-py non blocking in eventlet application?

From what I know it should be enough if I take care about connection usage (eg to use different connection for each greenlet). But I want to be sure.

If you known what else is required, or how to use DB drivers correctly with Gevent/Eventlet please type it also.

like image 937
Robert Zaremba Avatar asked Jan 23 '13 23:01

Robert Zaremba


Video Answer


2 Answers

You can assume it will be magically patched if all of the following are true.

  • You're sure of the I/O is built on top of standard Python sockets or other things that eventlet/gevent monkeypatches. No files, no native (C) socket objects, etc.
  • You pass aggressive=True to patch_all (or patch_select), or you're sure the library doesn't use select or anything similar.
  • The driver doesn't use any (implicit) internal threads. (If the driver does use threads internally, patch_thread may work, but it may not.)

If you're not sure, it's pretty easy to test—probably easier than reading through the code and trying to work it out. Have one greenlet that just does something like this:

while True:
    print("running")
    gevent.sleep(0.1)

Then have another that runs a slow query against the database. If it's monkeypatched, the looping greenlet will keep printing "running" 10 times/second; if not, the looping greenlet will not get to run while the program is blocked on the query.

So, what do you do if your driver blocks?

The easiest solution is to use a truly concurrent threadpool for DB queries. The idea is that you fire off each query (or batch) as a threadpool job and greenlet-block your gevent on the completion of that job. (For really simple cases, where you don't need many concurrent queries, you can just spawn a threading.Thread for each one instead, but usually you can't get away with that.)

If the driver does significant CPU work (e.g., you're using something that runs an in-process cache, or even an entire in-process DBMS like sqlite), you want this threadpool to actually be implemented on top of processes, because otherwise the GIL may prevent your greenlets from running. Otherwise (especially if you care about Windows), you probably want to use OS threads. (However, this means you can't patch_threads(); if you need to do that, use processes.)

If you're using eventlet, and you want to use threads, there's a built-in simple solution called tpool that may be sufficient. If you're using gevent, or you need to use processes, this won't work. Unfortunately, blocking a greenlet (without blocking the whole event loop) on a real threading object is a bit different between eventlet and gevent, and not documented very well, but the tpool source should give you the idea. Beyond that part, the rest is just using concurrent.futures (see futures on pypi if you need this in 2.x or 3.1) to execute the tasks on a ThreadPoolExecutor or ProcessPoolExecutor. (Or, if you prefer, you can go right to threading or multiprocessing instead of using futures.)


Can you explain why I should use OS threads on Windows?

The quick summary is: If you stick to threads, you can pretty much just write cross-platform code, but if you go to processes, you're effectively writing code for two different platforms.

First, read the Programming guidelines for the multiprocessing module (both the "All platforms" section and the "Windows" section). Fortunately, a DB wrapper shouldn't run into most of this. You only need to deal with processes via the ProcessPoolExecutor. And, whether you wrap things up at the cursor-op level or the query level, all your arguments and return values are going to be simple types that can be pickled. Still, it's something you have to be careful about, which otherwise wouldn't be an issue.

Meanwhile, Windows has very low overhead for its intra-process synchronization objects, but very high overhead for its inter-process ones. (It also has very fast thread creation and very slow process creation, but that's not important if you're using a pool.) So, how do you deal with that? I had a lot of fun creating OS threads to wait on the cross-process sync objects and signal the greenlets, but your definition of fun may vary.

Finally, tpool can be adapted trivially to a ppool for Unix, but it takes more work on Windows (and you'll have to understand Windows to do that work).

like image 135
abarnert Avatar answered Sep 24 '22 02:09

abarnert


abarnert's answer is correct and very comprehensive. I just want to add that there is no "aggressive" patching in eventlet, probably gevent feature. Also if library uses select that is not a problem, because eventlet can monkey patch that too.

Indeed, in most cases eventlet.monkey_patch() is all you need. Of course, it must be done before creating any sockets.

If you still have any issues, feel free to open issue or write to eventlet mailing list or G+ community. All relevant links can be found at http://eventlet.net/

like image 37
temoto Avatar answered Sep 21 '22 02:09

temoto