Let's say I have a generator like so
def gen():
a = yield "Hello World"
a_ = a + 1 #Imagine that on my computer "+ 1" is an expensive operation
print "a_ = ", a_
b = yield a_
print "b =", b
print "a_ =", a_
yield b
Now let's say I do
>>> g = gen()
>>> g.next()
>>> g.send(42)
a_ = 43
43
Now we have calculated a_
. Now I would like to clone my generator like so.
>>> newG = clonify(g)
>>> newG.send(7)
b = 7
a_ = 43
7
but my original g
still works.
>>> g.send(11)
b = 11
a_ = 43
11
Specifically, clonify
takes the state of a generator, and copies it. I could just reset my generator to be like the old one, but that would require calculating a_
. Note also that I would not want to modify the generator extensively. Ideally, I could just take a generator object from a library and clonify
it.
Note: itertools.tee
won't work, because it does not handle sends.
Note: I only care about generators created by placing yield
statements in a function.
Python doesn't have any support for cloning generators.
Conceptually, this should be implementable, at least for CPython. But practically, it turns out to be very hard.
Under the covers, a generator is basically nothing but a wrapper around a stack frame.*
And a frame object is essentially just a code object, an instruction pointer (an index into that code object), the builtins/globals/locals environment, an exception state, and some flags and debugging info.
And both types are exposed to the Python level,** as are all the bits they need. So, it really should be just a matter of:
g.gi_frame
, but with a copy of the locals instead of the original locals. (All the user-level questions come down to whether to shallow-copy, deep-copy, or one of the above plus recursively cloning generators here.) And there's no obvious practical reason it shouldn't be possible to construct a frame object out of its bits, just as it is for a code object or most of the other hidden builtin types.
Unfortunately, as it turns out, Python doesn't expose a way to construct a frame object. I thought you could get around that just by using ctypes.pythonapi
to call PyFrame_New
, but the first argument to that is a PyThreadState
—which you definitely can't construct from Python, and shouldn't be able to. So, to make this work, you either have to:
PyFrame_New
does by banging on the C structs via ctypes
, orPyThreadState
by banging on the C structs (which will still require reading the code to PyFrame_New
carefully to know what you have to fake).I think this may still be doable (and I plan to play with it; if I come up with anything, I'll update the Cloning generators post on my blog), but it's definitely not going to be trivial—or, of course, even remotely portable.
There are also a couple of minor problems.
Locals are exposed to Python as a dict (whether you call locals()
for your own, or access g.gi_frame.f_locals
for a generator you want to clone). Under the covers, locals are actually stored on the C stack.*** You can get around this by using ctypes.pythonapi
to call PyFrame_LocalsToFast
and PyFrame_FastToLocals
. But the dict just contains the values, not cell objects, so doing this shuffle will turn all nonlocal variables into local variables in the clone.****
Exception state is exposed to Python as a type/value/traceback 3-tuple, but inside a frame there's also a borrowed (non-refcounted) reference to the owning generator (or NULL if it's not a generator frame). (The source explains why.) So, your frame-constructing function can't refcount the generator or you have a cycle and therefore a leak, but it has to refcount the generator or you have a potentially dangling pointer until the frame is assigned to a generator. The obvious answer seems to be to leave the generator NULL at frame construction, and have the generator-constructing function do the equivalent of self.gi_f.f_generator = self; Py_DECREF(self)
.
* It also keeps a copy of the frame's code object and running flag, so they can be accessed after the generator exits and disposes of the frame.
** generator
and frame
are hidden from builtins, but they're available as types.GeneratorType
types.FrameType
. And they have docstrings, descriptions of their attributes in the inspect
module, etc., just like function and code objects.
*** When you compile a function definition, the compiler makes a list of all the locals, stored in co_varnames
, and turns each variable reference into a LOAD_FAST
/STORE_FAST
opcode with the index into co_varnames
as its argument. When a function call is executed, the frame object stores the stack pointer in f_valuestack
, pushes len(co_varnames)*sizeof(PyObject *)
onto the stack, and then LOAD_FAST 0
just accesses *f_valuestack[0]
. Closures are more complicated; a bit too much to explain in a comment on an SO answer.
**** I'm assuming you wanted the clone to share the original's closure references. If you were hoping to recursively clone all the frames up the stack to get a new set of closure references to bind, that adds another problem: there's no way to construct new cell objects from Python either.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With