Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is bytearray not a Sequence in Python 2?

I'm seeing a weird discrepancy in behavior between Python 2 and 3.

In Python 3 things seem to work fine:

Python 3.5.0rc2 (v3.5.0rc2:cc15d736d860, Aug 25 2015, 04:45:41) [MSC v.1900 32 b
it (Intel)] on win32
>>> from collections import Sequence
>>> isinstance(bytearray(b"56"), Sequence)
True

But not in Python 2:

Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on wi
n32
>>> from collections import Sequence
>>> isinstance(bytearray("56"), Sequence)
False

The results seem to be consistent across minor releases of both Python 2.x and 3.x. Is this a known bug? Is it a bug at all? Is there any logic behind this difference?

I am actually more worried about the C API function PySequence_Check properly identifying an object of type PyByteArray_Type as exposing the sequence protocol, which by looking at the source code it seems like it should, but any insight into this whole thing is very welcome.

like image 992
Jaime Avatar asked Aug 27 '15 19:08

Jaime


2 Answers

Abstract classes from collections use ABCMeta.register(subclass) to

Register subclass as a “virtual subclass” of this ABC.

In Python 3 issubclass(bytearray, Sequence) returns True because bytearray is explicitly registered as a subclass of ByteString (which is derived from Sequence) and MutableSequence. See the relevant part of Lib/_collections_abc.py:

class ByteString(Sequence):

    """This unifies bytes and bytearray.

    XXX Should add all their methods.
    """

    __slots__ = ()

ByteString.register(bytes)
ByteString.register(bytearray)
...
MutableSequence.register(bytearray)  # Multiply inheriting, see ByteString

Python 2 doesn't do that (from Lib/_abcoll.py):

Sequence.register(tuple)
Sequence.register(basestring)
Sequence.register(buffer)
Sequence.register(xrange)
...
MutableSequence.register(list)

This behaviour was changed in Python 3.0 (in this commit specifically):

Add ABC ByteString which unifies bytes and bytearray (but not memoryview). There's no ABC for "PEP 3118 style buffer API objects" because there's no way to recognize these in Python (apart from trying to use memoryview() on them).

And there's more information in PEP 3119:

This is a proposal to add Abstract Base Class (ABC) support to Python 3000. It proposes: [...] Specific ABCs for containers and iterators, to be added to the collections module.

Much of the thinking that went into the proposal is not about the specific mechanism of ABCs, as contrasted with Interfaces or Generic Functions (GFs), but about clarifying philosophical issues like "what makes a set", "what makes a mapping" and "what makes a sequence".

[...] a metaclass for use with ABCs that will allow us to add an ABC as a "virtual base class" (not the same concept as in C++) to any class, including to another ABC. This allows the standard library to define ABCs Sequence and MutableSequence and register these as virtual base classes for built-in types like basestring, tuple and list, so that for example the following conditions are all true: [...] issubclass(bytearray, MutableSequence).

Just FYI memoryview was registered as a subclass of Sequence only in Python 3.4:

There's no ducktyping for this due to the Sequence/Mapping confusion so it's a simple missing explicit registration.

(see issue18690 for details).


PySequence_Check from Python C API does not rely on the collections module:

int
PySequence_Check(PyObject *s)
{
    if (PyDict_Check(s))
        return 0;
    return s != NULL && s->ob_type->tp_as_sequence &&
        s->ob_type->tp_as_sequence->sq_item != NULL;
}

It checks for non-zero tp_as_sequence field (example for bytearray) and if that succeeds, for non-zero sq_item field (which is basically getitem - example for bytearray).

like image 105
vaultah Avatar answered Oct 20 '22 17:10

vaultah


When you look at the source code of collections abstract classes you will see that in python3 (file _collections_abc.py) subclass of Sequence class , class ByteString, register itself with bytearray while in python2 (file _abcoll.py) there is no ByteString class and Sequence does not register itself with bytearray.

By register I mean that the abstract class Sequence (or it's subclass ByteString) is calling abc.ABCMeta.register method which as said in description of this method Register subclass as a “virtual subclass” of this ABC.

I think that is causing differennt behaviour between py2 and py3 but IMHO it's bug (or better fixed bug in py3).

like image 21
beezz Avatar answered Oct 20 '22 15:10

beezz