Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do greenlets work?

Tags:

python

How are greenlets implemented? Python uses the C stack for the interpreter and it heap-allocates Python stack frames, but beyond that, how does it allocate/swap stacks, how does it hook into the interpreter and function call mechanisms, and how does this interact with C extensions? (Any quirks)?

There are some comments at the top of greenlet.c in the source, but they're a bit opaque. FWIW I'm coming from the perspective of someone who is unfamiliar with CPython internals but is very familiar with low-level systems programming, C, threads, events, coroutines/cooperative threads, kernel programming, etc.

(Some data points: they don't use ucontext.h and they do 2x memcpy, alloc, and free on every context switch.)

like image 875
Yang Avatar asked Jul 28 '10 00:07

Yang


People also ask

What is greenlets?

Greenlets are a very lightweight coroutine written in C that are cooperatively scheduled. They provide us with a very lightweight thread- like object that allows us to achieve concurrent execution within our Python programs without incurring the cost of spinning up multiple threads.

How does gevent work?

Gevent is a library based on non-blocking IO (libevent/libev) and lightweight greenlets (essentially Python coroutines). Non-blocking IO means requests waiting for network IO won't block other requests; greenlets mean we can continue to write code in synchronous style natural to Python.

What does gevent spawn do?

New greenlets are spawned by creating a Greenlet instance and calling its start method. (The gevent. spawn() function is a shortcut that does exactly that). The start method schedules a switch to the greenlet that will happen as soon as the current greenlet gives up control.

What is gevent pool?

An equivalent of itertools. imap() , operating in parallel. The func is applied to each element yielded from each iterable in iterables in turn, collecting the result. If this object has a bound on the number of active greenlets it can contain (such as Pool ), then at most that number of tasks will operate in parallel.


2 Answers

When a python program runs, you have essentially two pieces of code running under the hood.

First, the CPython interpreter C code running and using the standard C-stack to save its internal stack-frames. Second, the actual python interpreted bytecode which does not use the C-stack, but rather uses the heap to save its stack-frames. A greenlet is just standard python code and thus behaves identically.

Now in a typical microthreaded application, you'd have thousands if not millions of microthreads (greenlets) switching all over the place. Each switch is essentially equivalent to a function call with a deferred return (so to speak) and thus will use a bit of stack. Problem is, the C-stack of the interpreter will sooner or later hit a stack overflow. This is exactly what the greenlet extension aimed at, it is designed to move pieces of the stack back and forth to/from the heap in order to avoid this problem.

As you know, there are three fundamental events with greenlets, a spawn, a switch, and a return, so let's look at those in turn:

A) A Spawn

The newly spawned greenlet is associated with its own base address in the stack (where we currently are). Apart from that, nothing special happens. The python code of the newly spawned greenlet uses the heap in a normal way and the interpreter continues using the C-stack as usual.

B) A Switch

When a greenlet is switched to from a switching greenlet, the relevant part of the C-stack (starting from the base address of the switchng greenlet) is copied to the heap. The copied C-stack area is freed and the switched greenlet's interpreter previously saved stack data is copied from the heap to the newly freed C-stack area. The python code of the switched greenlet continues using the heap in a normal way. Of course the extension code keeps track of all of this (which heap section goes to which greenlet and so on).

C) A Return

The stack is untouched and the heap area of the returning greenlet is freed by the python garbage collector.

Basically this is it, many more details and explanations can be found at (http://www.stackless.com/pipermail/stackless-dev/2004-March/000022.html) or just by reading the code as pointed in Alex's answer.

like image 187
Rabih Kodeih Avatar answered Sep 28 '22 23:09

Rabih Kodeih


If get and study the greenlet's sources, you'll see at the top of greenlet.c a long comment that starts at line 16 with the following summary...:

A PyGreenlet is a range of C stack addresses that must be saved and restored in such a way that the full range of the stack contains valid data when we switch to it.

and continues to line 82, summarizing exactly what you're asking about. Have you studies these lines (and the following 1000+ implementing them;-)...? I don't see a way to further squeeze these 66 lines down while still making sense, nor any added value in copying and pasting them here.

Basically, you'll see there is no real "hooking" to speak of (the C level stack is switched back and forth "under the interpreter's nose", so to speak) except for the delicate interactions with thread state in multi-threaded code, and the saving and restoring of a greenlet's state from/to the stack is based on memcpy calls plus some calls to the Python memory manager to allocate/reallocate and free space coming from, or going back to, the stack. The three functions in line 227-295 handle the grunt work, and they're wrapped in a couple C macros at 298-310 "in order to simplify maintenance", as the comment there says.

The interface through which other C extensions can interact with the greenlet extension is implemented at lines 956-1045, and exposed through the "CObject API" (via greenlet.h, of course) documented here.

like image 31
Alex Martelli Avatar answered Sep 28 '22 22:09

Alex Martelli