Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Python 3 know how to pickle extension types, especially Numpy arrays?

Numpy arrays, being extension types (aka defined using in extensions the C API), declare additional fields outside the scope of the Python interpreter (for example the data attribute, which is a Buffer Structure, as documented in Numpy's array interface.
To be able to serialize it, Python 2 used to use the __reduce__ function as part of the pickle protocol, as stated in the doc, and explained here.

But, even if __reduce__ still exists in Python 3, the Pickle protocol section (and Pickling and unpickling extension types a fortiori) was removed from the doc, so it is unclear what does what.
Moreover, there are additional entries that relate to pickling extension types:

  • copyreg, described as a Pickle interface constructor registration for extension types, but there's no mention of extension types in the copyreg module itself.
  • PEP 3118 -- Revising the buffer protocol which released a new buffer protocol for Python 3. (and maybe automates pickling for this buffer protocol).
  • New-style class: One can assume that the new-style classes have an influence on the pickling process.

So, how does all of this relate to Numpy arrays:

  1. Does Numpy array implement special methods, such as __reduce__ to inform Python on how to pickle them (or copyreg)? Numpy objects still expose a __reduce__ method, but it may be for compatibility reasons.
  2. Does Numpy uses Python's C-API structures that are supported out of the box by Pickle (like the new buffer protocol), so nothing supplementary is necessary in order to pickle numpy arrays?
like image 939
Phylliade Avatar asked Jan 29 '23 21:01

Phylliade


1 Answers

Python 3 pickle still supports __reduce__, it is covered under the Pickling Class Instances section.

Numpy's support has not changed in this regard; it implements __reduce__ on arrays to support pickling in either Python 2 or 3:

>>> import numpy
>>> numpy.array(0).__reduce__()
(<built-in function _reconstruct>, (<class 'numpy.ndarray'>, (0,), b'b'), (1, (), dtype('int64'), False, b'\x00\x00\x00\x00\x00\x00\x00\x00'))

A three-element tuple is returned, consisting of a function object to recreate the value, a tuple of arguments for that function, and a state tuple to pass no newinstance.__setstate__().

like image 145
Martijn Pieters Avatar answered Feb 02 '23 09:02

Martijn Pieters