Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a generator/iterator with the Python C API?

How do I replicate the following Python code with the Python C API?

class Sequence():     def __init__(self, max):         self.max = max     def data(self):         i = 0         while i < self.max:             yield i             i += 1 

So far, I have this:

#include <Python/Python.h> #include <Python/structmember.h>  /* Define a new object class, Sequence. */ typedef struct {     PyObject_HEAD     size_t max; } SequenceObject;  /* Instance variables */ static PyMemberDef Sequence_members[] = {     {"max", T_UINT, offsetof(SequenceObject, max), 0, NULL},     {NULL} /* Sentinel */ };  static int Sequence_Init(SequenceObject *self, PyObject *args, PyObject *kwds) {     if (!PyArg_ParseTuple(args, "k", &(self->max))) {         return -1;     }     return 0; }  static PyObject *Sequence_data(SequenceObject *self, PyObject *args);  /* Methods */ static PyMethodDef Sequence_methods[] = {     {"data", (PyCFunction)Sequence_data, METH_NOARGS,      "sequence.data() -> iterator object\n"      "Returns iterator of range [0, sequence.max)."},     {NULL} /* Sentinel */ };  /* Define new object type */ PyTypeObject Sequence_Type = {    PyObject_HEAD_INIT(NULL)    0,                         /* ob_size */    "Sequence",                /* tp_name */    sizeof(SequenceObject),    /* tp_basicsize */    0,                         /* tp_itemsize */    0,                         /* tp_dealloc */    0,                         /* tp_print */    0,                         /* tp_getattr */    0,                         /* tp_setattr */    0,                         /* tp_compare */    0,                         /* tp_repr */    0,                         /* tp_as_number */    0,                         /* tp_as_sequence */    0,                         /* tp_as_mapping */    0,                         /* tp_hash */    0,                         /* tp_call */    0,                         /* tp_str */    0,                         /* tp_getattro */    0,                         /* tp_setattro */    0,                         /* tp_as_buffer */    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags*/    "Test generator object",   /* tp_doc */    0,                         /* tp_traverse */    0,                         /* tp_clear */    0,                         /* tp_richcompare */    0,                         /* tp_weaklistoffset */    0,                         /* tp_iter */    0,                         /* tp_iternext */    0,                         /* tp_methods */    Sequence_members,          /* tp_members */    0,                         /* tp_getset */    0,                         /* tp_base */    0,                         /* tp_dict */    0,                         /* tp_descr_get */    0,                         /* tp_descr_set */    0,                         /* tp_dictoffset */    (initproc)Sequence_init,   /* tp_init */    0,                         /* tp_alloc */    PyType_GenericNew,         /* tp_new */ };  static PyObject *Sequence_data(SequenceObject *self, PyObject *args) {     /* Now what? */ } 

But I'm not sure where to go next. Could anyone offer some suggestions?

Edit

I suppose the main problem I'm having with this is simulating the yield statement. As I understand it, it is a pretty simple-looking, but in reality complex, statement — it creates a generator with its own __iter__() and next() methods which are called automatically. Searching through the docs, it seems to be associated with the PyGenObject; however, how to create a new instance of this object is unclear. PyGen_New() takes as its argument a PyFrameObject, the only reference to which I can find is PyEval_GetFrame(), which doesn't seem to be what I want (or am I mistaken?). Does anyone have any experience with this they can share?

Further Edit

I found this to be clearer when I (essentially) expanded what Python was doing behind the scenes:

class IterObject():     def __init__(self, max):         self.max = max     def __iter__(self):         self.i = 0         return self     def next(self):         if self.i >= self.max:             raise StopIteration         self.i += 1         return self.i  class Sequence():     def __init__(self, max):         self.max = max     def data(self):         return IterObject(self.max) 

Technically the sequence is off by one but you get the idea.

The only problem with this is it's very annoying to create a new object every time one needs a generator — even more so in Python than C because of the required monstrosity that comes with defining a new type. And there can be no yield statement in C because C has no closures. What I did instead (since I couldn't find it in the Python API — please point me to a standard object if it already exists!) was create a simple, generic generator object class that called back a C function for every next() method call. Here it is (note that I have not yet tried compiling this because it is not complete — see below):

#include <Python/Python.h> #include <Python/structmember.h> #include <stdlib.h>  /* A convenient, generic generator object. */  typedef PyObject *(*callback)(PyObject *callee, void *info) PyGeneratorCallback;  typedef struct {     PyObject HEAD     PyGeneratorCallback callback;     PyObject *callee;     void *callbackInfo; /* info to be passed along to callback function. */     bool freeInfo; /* true if |callbackInfo| should be free'()d when object                     * dealloc's, false if not. */ } GeneratorObject;  static PyObject *Generator_iter(PyObject *self, PyObject *args) {     Py_INCREF(self);     return self; }  static PyObject *Generator_next(PyObject *self, PyObject *args) {     return self->callback(self->callee, self->callbackInfo); }  static PyMethodDef Generator_methods[] = {     {"__iter__", (PyCFunction)Generator_iter, METH_NOARGS, NULL},     {"next", (PyCFunction)Generator_next, METH_NOARGS, NULL},     {NULL} /* Sentinel */ };  static void Generator_dealloc(GenericEventObject *self) {     if (self->freeInfo && self->callbackInfo != NULL) {         free(self->callbackInfo);     }     self->ob_type->tp_free((PyObject *)self); }  PyTypeObject Generator_Type = {    PyObject_HEAD_INIT(NULL)    0,                         /* ob_size */    "Generator",               /* tp_name */    sizeof(GeneratorObject),   /* tp_basicsize */    0,                         /* tp_itemsize */    Generator_dealloc,         /* tp_dealloc */    0,                         /* tp_print */    0,                         /* tp_getattr */    0,                         /* tp_setattr */    0,                         /* tp_compare */    0,                         /* tp_repr */    0,                         /* tp_as_number */    0,                         /* tp_as_sequence */    0,                         /* tp_as_mapping */    0,                         /* tp_hash */    0,                         /* tp_call */    0,                         /* tp_str */    0,                         /* tp_getattro */    0,                         /* tp_setattro */    0,                         /* tp_as_buffer */    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags*/    0,                         /* tp_doc */    0,                         /* tp_traverse */    0,                         /* tp_clear */    0,                         /* tp_richcompare */    0,                         /* tp_weaklistoffset */    0,                         /* tp_iter */    0,                         /* tp_iternext */    0,                         /* tp_methods */    0,                         /* tp_members */    0,                         /* tp_getset */    0,                         /* tp_base */    0,                         /* tp_dict */    0,                         /* tp_descr_get */    0,                         /* tp_descr_set */    0,                         /* tp_dictoffset */    0,                         /* tp_init */    0,                         /* tp_alloc */    PyType_GenericNew,         /* tp_new */ };  /* Returns a new generator object with the given callback function  * and arguments. */ PyObject *Generator_New(PyObject *callee, void *info,                         bool freeInfo, PyGeneratorCallback callback) {     GeneratorObject *generator = (GeneratorObject *)_PyObject_New(&Generator_Type);     if (generator == NULL) return NULL;      generator->callee = callee;     generator->info = info;     generator->callback = callback;     self->freeInfo = freeInfo;      return (PyObject *)generator; }  /* End of Generator definition. */  /* Define a new object class, Sequence. */ typedef struct {     PyObject_HEAD     size_t max; } SequenceObject;  /* Instance variables */ static PyMemberDef Sequence_members[] = {     {"max", T_UINT, offsetof(SequenceObject, max), 0, NULL},     {NULL} /* Sentinel */ }  static int Sequence_Init(SequenceObject *self, PyObject *args, PyObject *kwds) {     if (!PyArg_ParseTuple(args, "k", &self->max)) {         return -1;     }     return 0; }  static PyObject *Sequence_data(SequenceObject *self, PyObject *args);  /* Methods */ static PyMethodDef Sequence_methods[] = {     {"data", (PyCFunction)Sequence_data, METH_NOARGS,      "sequence.data() -> iterator object\n"      "Returns generator of range [0, sequence.max)."},     {NULL} /* Sentinel */ };  /* Define new object type */ PyTypeObject Sequence_Type = {    PyObject_HEAD_INIT(NULL)    0,                         /* ob_size */    "Sequence",                /* tp_name */    sizeof(SequenceObject),    /* tp_basicsize */    0,                         /* tp_itemsize */    0,                         /* tp_dealloc */    0,                         /* tp_print */    0,                         /* tp_getattr */    0,                         /* tp_setattr */    0,                         /* tp_compare */    0,                         /* tp_repr */    0,                         /* tp_as_number */    0,                         /* tp_as_sequence */    0,                         /* tp_as_mapping */    0,                         /* tp_hash */    0,                         /* tp_call */    0,                         /* tp_str */    0,                         /* tp_getattro */    0,                         /* tp_setattro */    0,                         /* tp_as_buffer */    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags*/    "Test generator object",   /* tp_doc */    0,                         /* tp_traverse */    0,                         /* tp_clear */    0,                         /* tp_richcompare */    0,                         /* tp_weaklistoffset */    0,                         /* tp_iter */    0,                         /* tp_iternext */    0,                         /* tp_methods */    Sequence_members,          /* tp_members */    0,                         /* tp_getset */    0,                         /* tp_base */    0,                         /* tp_dict */    0,                         /* tp_descr_get */    0,                         /* tp_descr_set */    0,                         /* tp_dictoffset */    (initproc)Sequence_init,   /* tp_init */    0,                         /* tp_alloc */    PyType_GenericNew,         /* tp_new */ };  static PyObject *Sequence_data(SequenceObject *self, PyObject *args) {     size_t *info = malloc(sizeof(size_t));     if (info == NULL) return NULL;     *info = 0;      /* |info| will be free'()d by the returned generator object. */     GeneratorObject *ret = Generator_New(self, info, true,                                          &Sequence_data_next_callback);     if (ret == NULL) {         free(info); /* Watch out for memory leaks! */     }     return ret; }  PyObject *Sequence_data_next_callback(PyObject *self, void *info) {     size_t i = info;     if (i > self->max) {         return NULL; /* TODO: How do I raise StopIteration here? I can't seem to find                       *       a standard exception. */     } else {         return Py_BuildValue("k", i++);     } } 

However, unfortunately, I'm still not finished. The only question I have left is: How do I raise a StopIteration exception with the C API? I can't seem to find it listed in the Standard Exceptions. Also, perhaps more importantly, is this the correct way to approach this problem?

Thanks to anyone that's still following this.

like image 790
Michael Avatar asked Nov 29 '09 15:11

Michael


People also ask

How do you create a generator in Python?

It is fairly simple to create a generator in Python. It is as easy as defining a normal function, but with a yield statement instead of a return statement. If a function contains at least one yield statement (it may contain other yield or return statements), it becomes a generator function.

How iterator can be used to generate the generator?

Yes, We can create a generator by using iterators in python Creating iterators is easy, we can create a generator by using the keyword yield statement. Python generators are an easy and simple way of creating iterators. and is mainly used to declare a function that behaves like an iterator.

What is __ Iter__ in Python?

The __iter__() method returns the iterator object itself. If required, some initialization can be performed. The __next__() method must return the next item in the sequence. On reaching the end, and in subsequent calls, it must raise StopIteration .

Are generators Iterables Python?

Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions ( yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator .


2 Answers

Below is a simple implementation of module spam with one function myiter(int) returning iterator:

import spam for i in spam.myiter(10):     print i 

prints numbers from 0 to 9.

It is simpler then your case but shows main points: defining object with standard __iter__() and next() methods, and implementing iterator behaviour including raising StopIteration when appropriate.

In your case iterator object needs to hold reference to Sequence (so you'll need deallocator method for it to Py_DECREF it). The sequence itself needs to implement __iter()__ and create an iterator inside it.


Structure containing state of iterator. (In your version instead of m, it would have reference to Sequence.)

typedef struct {   PyObject_HEAD   long int m;   long int i; } spam_MyIter; 

Iterator's __iter__() method. It always simply returns self. It allows for both iterator and collection to be treated the same in constructs like for ... in ....

PyObject* spam_MyIter_iter(PyObject *self) {   Py_INCREF(self);   return self; } 

Implementation of our iteration: next() method.

PyObject* spam_MyIter_iternext(PyObject *self) {   spam_MyIter *p = (spam_MyIter *)self;   if (p->i < p->m) {     PyObject *tmp = Py_BuildValue("l", p->i);     (p->i)++;     return tmp;   } else {     /* Raising of standard StopIteration exception with empty value. */     PyErr_SetNone(PyExc_StopIteration);     return NULL;   } } 

We need extended version of PyTypeObject structure to provide Python with information about __iter__() and next(). We want them to be called efficiently, so no name-based lookup in dictionary.

static PyTypeObject spam_MyIterType = {     PyObject_HEAD_INIT(NULL)     0,                         /*ob_size*/     "spam._MyIter",            /*tp_name*/     sizeof(spam_MyIter),       /*tp_basicsize*/     0,                         /*tp_itemsize*/     0,                         /*tp_dealloc*/     0,                         /*tp_print*/     0,                         /*tp_getattr*/     0,                         /*tp_setattr*/     0,                         /*tp_compare*/     0,                         /*tp_repr*/     0,                         /*tp_as_number*/     0,                         /*tp_as_sequence*/     0,                         /*tp_as_mapping*/     0,                         /*tp_hash */     0,                         /*tp_call*/     0,                         /*tp_str*/     0,                         /*tp_getattro*/     0,                         /*tp_setattro*/     0,                         /*tp_as_buffer*/     Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_ITER,       /* tp_flags: Py_TPFLAGS_HAVE_ITER tells python to          use tp_iter and tp_iternext fields. */     "Internal myiter iterator object.",           /* tp_doc */     0,  /* tp_traverse */     0,  /* tp_clear */     0,  /* tp_richcompare */     0,  /* tp_weaklistoffset */     spam_MyIter_iter,  /* tp_iter: __iter__() method */     spam_MyIter_iternext  /* tp_iternext: next() method */ }; 

myiter(int) function creates iterator.

static PyObject * spam_myiter(PyObject *self, PyObject *args) {   long int m;   spam_MyIter *p;    if (!PyArg_ParseTuple(args, "l", &m))  return NULL;    /* I don't need python callable __init__() method for this iterator,      so I'll simply allocate it as PyObject and initialize it by hand. */    p = PyObject_New(spam_MyIter, &spam_MyIterType);   if (!p) return NULL;    /* I'm not sure if it's strictly necessary. */   if (!PyObject_Init((PyObject *)p, &spam_MyIterType)) {     Py_DECREF(p);     return NULL;   }    p->m = m;   p->i = 0;   return (PyObject *)p; } 

The rest is pretty boring...

static PyMethodDef SpamMethods[] = {     {"myiter",  spam_myiter, METH_VARARGS, "Iterate from i=0 while i<m."},     {NULL, NULL, 0, NULL}        /* Sentinel */ };  PyMODINIT_FUNC initspam(void) {   PyObject* m;    spam_MyIterType.tp_new = PyType_GenericNew;   if (PyType_Ready(&spam_MyIterType) < 0)  return;    m = Py_InitModule("spam", SpamMethods);    Py_INCREF(&spam_MyIterType);   PyModule_AddObject(m, "_MyIter", (PyObject *)&spam_MyIterType); } 
like image 103
Tomek Szpakowicz Avatar answered Oct 12 '22 03:10

Tomek Szpakowicz


In Sequence_data, you must either return a new PyInt instance or throw a StopIteration exception which tells the code outside that there are no more values. See PEP 255 for details and 9.10 Generators.

See Iterator Protocol for helper functions in the Python/C API.

like image 30
Aaron Digulla Avatar answered Oct 12 '22 01:10

Aaron Digulla