Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickling of a namedtuple instance succeeds normally, but fails when module is Cythonized

I have a namedtuple type defined inside a module consisting of two classes, foo and bar, defined in the module's only file, mod.py. I am able to create instances of both foo and bar without issue and pickle them. I am now trying to Cythonize it so that I can distribute the module as bytecode.

The module file structure looks like:

./mod.pyx
./setup.py
./demo.py

The content of `mod.pyx' is:

import collections

foo = collections.namedtuple('foo', 'A B')

class bar:

    def __init__(self,A,B):
        self.A = A
        self.B = B

The content of setup.py is:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize

setup( 
      ext_modules= cythonize([Extension('mod', ['mod.pyx'])])
)

I cythonize it using the command python setup.py build_ext --inplace, which creates the compiled module file:

./mod.cp37-win_amd64.pyd

Running the following demo.py:

import mod, pickle
ham = mod.foo(1,2)
spam = mod.bar(1,2)

print(pickle.dumps(spam))
print(pickle.dumps(ham))

Successfully pickles spam, the instance of class bar, but fails on ham, the instance of namedtuple foo, with the error message:

PicklingError: Can't pickle <class 'importlib._bootstrap.foo'>: attribute lookup foo on importlib._bootstrap failed

This is all done in Python 3.7, if it matters. It seems like Pickle can no longer find the class definition of mod.foo, even though Python is able to create an instance without issue. I know namedtuple has some weird behavior with respect to naming of the class it returns, and I admit I am a relative novice at packaging Cython modules.

A bit of googling turned up a few known issues with namedtuples and Cython, so I'm wondering if this might be part of a known issue, or if I am just packaging my module incorrectly.

like image 607
Porksodaguy Avatar asked Mar 18 '19 15:03

Porksodaguy


1 Answers

In order for pickle to work, the attribute __module__ of the foo-type must be set and should be mod.

namedtuple uses a trick/heuristic (i.e lookup in sys._getframe(1).f_globals) to get this information:

def namedtuple(typename, field_names, *, rename=False, defaults=None, module=None):
    ...
    # For pickling to work, the __module__ variable needs to be set to the frame
    # where the named tuple is created.  Bypass this step in environments where
    # sys._getframe is not defined (Jython for example) or sys._getframe is not
    # defined for arguments greater than 0 (IronPython), or where the user has
    # specified a particular module.
    if module is None:
        try:
            module = _sys._getframe(1).f_globals.get('__name__', '__main__')
        except (AttributeError, ValueError):
            pass
    if module is not None:
        result.__module__ = module
    ...

The problem with the Cython- or C-extensions is that, this heuristic will not work and _sys._getframe(1).f_globals.get('__name__', '__main__') will yield importlib._bootstrap and not mod.

To fix that you need to pass right module-name to namedtuple-factory (as pointed out in the code-comments), i.e.:

foo = collections.namedtuple('foo', 'A B', module='mod')

or to keep it more generic:

foo = collections.namedtuple('foo', 'A B', module=__name__)

Now, after importing, foo.__module__ is mod as expected by pickle and ham can be pickled.


By the way, pickling of bar functions, because Cython explicitly sets the right __module__ attribute (i.e. mod), while constructing the class.

like image 80
ead Avatar answered Nov 14 '22 23:11

ead