Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do yield works in Python C code, good & bad part

Tags:

python

yield

c

Recently I've been looking into the code of Python. I know how to use generators (next, send and etc..), but it's fun to understand it by reading the Python C code.

I found the code in Object/genobject.c, and it's not that hard (but still not easy) to understand. So I want to know how it really works, and make sure I do not have a misunderstanding about generators in Python.

I know everything calls

static PyObject *
gen_send_ex(PyGenObject *gen, PyObject *arg, int exc)

and the result is returned from PyEval_EvalFrameEx which looks like it's a dynamic frame struct, could I understand it as stack or something?

Ok, It looks like Python stores some context in memory (am I right?). It looks like every time we use yield it creates a generator, and stores the context in memory, although not all of the functions and vars.

I know if I have big loop or big data to parse, yield is amazing, it saves a lot of memory and make it simple. But some of my workmates like to use yield everywhere, just like return. It's not easy to read and understand the code, and Python stores context for most of the function that may never be called again. Is it a bad practice?

So, the questions are:

  1. How does PyEval_EvalFrameEx work.
  2. Memory use of yield.
  3. Is it bad practice for using yield everywhere.

And I found if I have a generator, function gen_send_ex will be called twice, why?

def test():
    while 1:
        yield 'test here'

test().next()

It will call gen_send_ex twice, first time with no args, with args the second time, and get the result.

Thanks for your patience.

like image 858
GuoJing Avatar asked Jul 01 '14 01:07

GuoJing


1 Answers

I saw these articles:

This article tell me how does PyEval_EvalFrameEx work.

http://tech.blog.aknin.name/2010/09/02/pythons-innards-hello-ceval-c-2/

This article tell me the frame struct in Python.

http://tech.blog.aknin.name/2010/07/22/pythons-innards-interpreter-stacks/

These two stuff are very important for us.

So let me answer my question myself. I don't know if am I right.

If I have misunderstanding or completely wrong, Please let me know.

If I have code:

def gen():                                                                                                                                                                
    count = 0                                                                           
    while count < 10:                                                                   
        count += 1                                                                      
        print 'call here'                                                               
        yield count

That is a very simple generator.

f = gen()

And every time we call it, Python create a generator object.

PyObject *                                                                           
PyGen_New(PyFrameObject *f)                                                          
{                                                                                    
    PyGenObject *gen = PyObject_GC_New(PyGenObject, &PyGen_Type);                    
    if (gen == NULL) {                                                               
        Py_DECREF(f);                                                                
        return NULL;                                                                 
    }                                                                                
    gen->gi_frame = f;                                                               
    Py_INCREF(f->f_code);                                                            
    gen->gi_code = (PyObject *)(f->f_code);                                          
    gen->gi_running = 0;                                                             
    gen->gi_weakreflist = NULL;                                                      
    _PyObject_GC_TRACK(gen);                                                         
    return (PyObject *)gen;                                                          
}

We could see it init a generator object. And Init a Frame.

Anything we do like f.send() or f.next(), It will call gen_send_ex, and the code below:

static PyObject *                                                                    
gen_iternext(PyGenObject *gen)                                                                                                                                           
{                                                                                    
    return gen_send_ex(gen, NULL, 0);                                                
}

static PyObject *                                                                    
gen_send(PyGenObject *gen, PyObject *arg)                                            
{                                                                                    
    return gen_send_ex(gen, arg, 0);                                                 
}

Only difference between two function is arg, send is send an arg, next send NULL.

gen_send_ex code below:

static PyObject *
gen_send_ex(PyGenObject *gen, PyObject *arg, int exc)
{
    PyThreadState *tstate = PyThreadState_GET();
    PyFrameObject *f = gen->gi_frame;
    PyObject *result;

    if (gen->gi_running) {
        fprintf(stderr, "gi init\n");
        PyErr_SetString(PyExc_ValueError,
                        "generator already executing");
        return NULL;
    }
    if (f==NULL || f->f_stacktop == NULL) {
        fprintf(stderr, "check stack\n");
        /* Only set exception if called from send() */
        if (arg && !exc)
            PyErr_SetNone(PyExc_StopIteration);
        return NULL;
    }

    if (f->f_lasti == -1) {
        fprintf(stderr, "f->f_lasti\n");
        if (arg && arg != Py_None) {
            fprintf(stderr, "something here\n");
            PyErr_SetString(PyExc_TypeError,
                            "can't send non-None value to a "
                            "just-started generator");
            return NULL;
        }
    } else {
        /* Push arg onto the frame's value stack */
        fprintf(stderr, "frame\n");
        if(arg) {
            /* fprintf arg */
        }
        result = arg ? arg : Py_None;
        Py_INCREF(result);
        *(f->f_stacktop++) = result;
    }

    fprintf(stderr, "here\n");
    /* Generators always return to their most recent caller, not
     * necessarily their creator. */
    Py_XINCREF(tstate->frame);
    assert(f->f_back == NULL);
    f->f_back = tstate->frame;

    gen->gi_running = 1;
    result = PyEval_EvalFrameEx(f, exc);
    gen->gi_running = 0;

    /* Don't keep the reference to f_back any longer than necessary.  It
     * may keep a chain of frames alive or it could create a reference
     * cycle. */
    assert(f->f_back == tstate->frame);
    Py_CLEAR(f->f_back);

    /* If the generator just returned (as opposed to yielding), signal
     * that the generator is exhausted. */
    if (result == Py_None && f->f_stacktop == NULL) {
        fprintf(stderr, "here2\n");
        Py_DECREF(result);
        result = NULL;
        /* Set exception if not called by gen_iternext() */
        if (arg)
            PyErr_SetNone(PyExc_StopIteration);
    }

    if (!result || f->f_stacktop == NULL) {
        fprintf(stderr, "here3\n");
        /* generator can't be rerun, so release the frame */
        Py_DECREF(f);
        gen->gi_frame = NULL;
    }
    fprintf(stderr, "return result\n");
    return result;
}

Looks like Generator Object is a controller of it's own Frame which called gi_frame.

I add some fprintf (...), so let's run code.

f.next()

f->f_lasti
here
call here
return result
1

So, first it goes to f_lasti(This is a integer offset into the byte code of the last instructions executed, initialized to -1), and yes it's -1, but with no args, then function goes on.

Then goto here, the most important thing now is PyEval_EvalFrameEx. PyEval_EvalFrameEx implements CPython’s evaluation loop, we could thing it runs every code (in fact is Python opcode), and run the line print 'call here', it print text.

When code goes to yield, Python stores context by using frame object (we could search Call Stack). Give value back and give up control of code.

After everything done, then return result, and showing value 1 in terminal.

Next time we run next(), it will not go to f_lasti scope. It shows:

frame
here
call here
return result
2

We did not send arg so still get result from PyEval_EvalFrameEx and result is 2.

like image 126
GuoJing Avatar answered Oct 19 '22 04:10

GuoJing