Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copy a generator

Let's say I have a generator like so

def gen():
    a = yield "Hello World"
    a_ = a + 1 #Imagine that on my computer "+ 1" is an expensive operation
    print "a_ = ", a_
    b = yield a_
    print "b =", b
    print "a_ =", a_
    yield b

Now let's say I do

>>> g = gen()
>>> g.next()
>>> g.send(42)
a_ =  43
43

Now we have calculated a_. Now I would like to clone my generator like so.

>>> newG = clonify(g)
>>> newG.send(7)
b = 7
a_ = 43
7

but my original g still works.

>>> g.send(11)
b = 11
a_ = 43
11

Specifically, clonify takes the state of a generator, and copies it. I could just reset my generator to be like the old one, but that would require calculating a_. Note also that I would not want to modify the generator extensively. Ideally, I could just take a generator object from a library and clonify it.

Note: itertools.tee won't work, because it does not handle sends.

Note: I only care about generators created by placing yield statements in a function.

like image 823
PyRulez Avatar asked Apr 23 '15 22:04

PyRulez


1 Answers

Python doesn't have any support for cloning generators.

Conceptually, this should be implementable, at least for CPython. But practically, it turns out to be very hard.


Under the covers, a generator is basically nothing but a wrapper around a stack frame.*

And a frame object is essentially just a code object, an instruction pointer (an index into that code object), the builtins/globals/locals environment, an exception state, and some flags and debugging info.

And both types are exposed to the Python level,** as are all the bits they need. So, it really should be just a matter of:

  • Create a frame object just like g.gi_frame, but with a copy of the locals instead of the original locals. (All the user-level questions come down to whether to shallow-copy, deep-copy, or one of the above plus recursively cloning generators here.)
  • Create a generator object out of the new frame object (and its code and running flag).

And there's no obvious practical reason it shouldn't be possible to construct a frame object out of its bits, just as it is for a code object or most of the other hidden builtin types.


Unfortunately, as it turns out, Python doesn't expose a way to construct a frame object. I thought you could get around that just by using ctypes.pythonapi to call PyFrame_New, but the first argument to that is a PyThreadState—which you definitely can't construct from Python, and shouldn't be able to. So, to make this work, you either have to:

  • Reproduce everything PyFrame_New does by banging on the C structs via ctypes, or
  • Manually build a fake PyThreadState by banging on the C structs (which will still require reading the code to PyFrame_New carefully to know what you have to fake).

I think this may still be doable (and I plan to play with it; if I come up with anything, I'll update the Cloning generators post on my blog), but it's definitely not going to be trivial—or, of course, even remotely portable.


There are also a couple of minor problems.

  • Locals are exposed to Python as a dict (whether you call locals() for your own, or access g.gi_frame.f_locals for a generator you want to clone). Under the covers, locals are actually stored on the C stack.*** You can get around this by using ctypes.pythonapi to call PyFrame_LocalsToFast and PyFrame_FastToLocals. But the dict just contains the values, not cell objects, so doing this shuffle will turn all nonlocal variables into local variables in the clone.****

  • Exception state is exposed to Python as a type/value/traceback 3-tuple, but inside a frame there's also a borrowed (non-refcounted) reference to the owning generator (or NULL if it's not a generator frame). (The source explains why.) So, your frame-constructing function can't refcount the generator or you have a cycle and therefore a leak, but it has to refcount the generator or you have a potentially dangling pointer until the frame is assigned to a generator. The obvious answer seems to be to leave the generator NULL at frame construction, and have the generator-constructing function do the equivalent of self.gi_f.f_generator = self; Py_DECREF(self).


* It also keeps a copy of the frame's code object and running flag, so they can be accessed after the generator exits and disposes of the frame.

** generator and frame are hidden from builtins, but they're available as types.GeneratorType types.FrameType. And they have docstrings, descriptions of their attributes in the inspect module, etc., just like function and code objects.

*** When you compile a function definition, the compiler makes a list of all the locals, stored in co_varnames, and turns each variable reference into a LOAD_FAST/STORE_FAST opcode with the index into co_varnames as its argument. When a function call is executed, the frame object stores the stack pointer in f_valuestack, pushes len(co_varnames)*sizeof(PyObject *) onto the stack, and then LOAD_FAST 0 just accesses *f_valuestack[0]. Closures are more complicated; a bit too much to explain in a comment on an SO answer.

**** I'm assuming you wanted the clone to share the original's closure references. If you were hoping to recursively clone all the frames up the stack to get a new set of closure references to bind, that adds another problem: there's no way to construct new cell objects from Python either.

like image 179
abarnert Avatar answered Oct 14 '22 08:10

abarnert