x not in set versus x != each element in the set?

Question

Suppose we have x = ['a', 'b'].

What is happening under the hood for the statement:

x not in {None, False}

that raises the unhashable type: 'list' error?

The workaround I found is to write this instead:

x != None and x!= False

but I am confused because, mathematically, both boolean expressions are equivalent.

CristiFati · Accepted Answer

Rationale

Here's what the official documentation states:

[Python 3]: class set([iterable]):

Return a new set or frozenset object whose elements are taken from iterable. The elements of a set must be hashable.
[Python 3]: hashable:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
...
All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not.
[Python 3]: object.__contains__(self, item) (just above the anchor):

The membership test operators (in and not in) are normally implemented as an iteration through a sequence. However, container objects can supply the following special method with a more efficient implementation, which also does not require the object be a sequence.

Going into [GitHub]: python/cpython - (v3.5.4) cpython/Objects/setobject.c:

Line #1991:

static PyMethodDef set_methods[] = {
    {"add",             (PyCFunction)set_add,           METH_O,
     add_doc},
    {"clear",           (PyCFunction)set_clear,         METH_NOARGS,
     clear_doc},
    {"__contains__",(PyCFunction)set_direct_contains,           METH_O | METH_COEXIST,  // @TODO - cfati: MARK THIS LINE

Line #1843:

static PyObject *
set_direct_contains(PySetObject *so, PyObject *key)
{
    long result;

    result = set_contains(so, key);  // @TODO - cfati: MARK THIS LINE
    if (result == -1)
        return NULL;
    return PyBool_FromLong(result);
}

Line #1823:

static int
set_contains(PySetObject *so, PyObject *key)
{
    PyObject *tmpkey;
    int rv;

    rv = set_contains_key(so, key);    // @TODO - cfati: MARK THIS LINE
    if (rv == -1) {
        if (!PySet_Check(key) || !PyErr_ExceptionMatches(PyExc_TypeError))
            return -1;
        PyErr_Clear();
        tmpkey = make_new_set(&PyFrozenSet_Type, key);
        if (tmpkey == NULL)
            return -1;
        rv = set_contains_key(so, tmpkey);  // @TODO - cfati: MARK THIS LINE
        Py_DECREF(tmpkey);
    }
    return rv;
}

Line #627:

static int
set_contains_key(PySetObject *so, PyObject *key)
{
    setentry entry;
    Py_hash_t hash;

    if (!PyUnicode_CheckExact(key) ||
        (hash = ((PyASCIIObject *) key)->hash) == -1) {  // @TODO - cfati: MARK THIS LINE
        hash = PyObject_Hash(key);
        if (hash == -1)
            return -1;
    }
    entry.key = key;
    entry.hash = hash;
    return set_contains_entry(so, &entry);  // @TODO - cfati: MARK THIS LINE
}

Line #614:

static int
set_contains_entry(PySetObject *so, setentry *entry)
{
    PyObject *key;
    setentry *lu_entry;

    lu_entry = set_lookkey(so, entry->key, entry->hash);  // @TODO - cfati: MARK THIS LINE
    if (lu_entry == NULL)
        return -1;
    key = lu_entry->key;
    return key != NULL && key != dummy;
}

As seen from the "callstack" (presented in reversed order), in order to test for membership (in / not in), hash is being performed (on all code paths) on the candidate member ("includee"), and since the list instance doesn't have the hash functionality, the interpreter spits out TypeError.

Resolution

There is a number of ways to get around this (as many others already pointed out most of them):

Use a container that doesn't require its elements to be hashable (list, tuple)
Test for __hash__ member
Wrap the membership test in a try / except block
Use a a hashable container (tuple) for the element: x = ('a', 'b')

but (generally) these are just ways to get around the problem (this is my personal opinion), since if you end up comparing a list to None and False, the code (that yields that list) could use some refactoring.

Jean-François Fabre · Answer

if you can enter all elements you want to test in a set, it means that all unhashable elements don't belong to your set (because you can't put them in)

You could do:

if x.__hash__ and x in {None,False}:

when object isn't hashable, x.__hash__ is None (other alternatives: Asking "is hashable" about a Python value) and the second part isn't evaluated.

or (better ask forgiveness than permission):

def belongs(x):
    try:
        return x in {None,False}
    except TypeError:   # unhashable type
        return False

both solutions are faster than using a list or tuple ((None,False)), because there's no linear search involved (that is if there are a lot of elements in the test list, not true for only 2 elements)

x not in set versus x != each element in the set?

Tags:

python

python-3.x

Tfovid

2 Answers

Rationale

Resolution

CristiFati

Jean-François Fabre

Recent Activity

Donate For Us

x not in set versus x != each element in the set?

Tags:

python

python-3.x

Tfovid

2 Answers

Rationale

Resolution

CristiFati

Jean-François Fabre

Related questions

Recent Activity

Donate For Us