Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x not in set versus x != each element in the set?

Suppose we have x = ['a', 'b'].

What is happening under the hood for the statement:

x not in {None, False}

that raises the unhashable type: 'list' error?

The workaround I found is to write this instead:

x != None and x!= False

but I am confused because, mathematically, both boolean expressions are equivalent.

like image 839
Tfovid Avatar asked Jan 03 '23 16:01

Tfovid


2 Answers

Rationale

Here's what the official documentation states:

  1. [Python 3]: class set([iterable]):

    Return a new set or frozenset object whose elements are taken from iterable. The elements of a set must be hashable.

  2. [Python 3]: hashable:

    An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
    ...
    All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not.

  3. [Python 3]: object.__contains__(self, item) (just above the anchor):

    The membership test operators (in and not in) are normally implemented as an iteration through a sequence. However, container objects can supply the following special method with a more efficient implementation, which also does not require the object be a sequence.

Going into [GitHub]: python/cpython - (v3.5.4) cpython/Objects/setobject.c:

  • Line #1991:

    static PyMethodDef set_methods[] = {
        {"add",             (PyCFunction)set_add,           METH_O,
         add_doc},
        {"clear",           (PyCFunction)set_clear,         METH_NOARGS,
         clear_doc},
        {"__contains__",(PyCFunction)set_direct_contains,           METH_O | METH_COEXIST,  // @TODO - cfati: MARK THIS LINE
    
  • Line #1843:

    static PyObject *
    set_direct_contains(PySetObject *so, PyObject *key)
    {
        long result;
    
        result = set_contains(so, key);  // @TODO - cfati: MARK THIS LINE
        if (result == -1)
            return NULL;
        return PyBool_FromLong(result);
    }
    
  • Line #1823:

    static int
    set_contains(PySetObject *so, PyObject *key)
    {
        PyObject *tmpkey;
        int rv;
    
        rv = set_contains_key(so, key);    // @TODO - cfati: MARK THIS LINE
        if (rv == -1) {
            if (!PySet_Check(key) || !PyErr_ExceptionMatches(PyExc_TypeError))
                return -1;
            PyErr_Clear();
            tmpkey = make_new_set(&PyFrozenSet_Type, key);
            if (tmpkey == NULL)
                return -1;
            rv = set_contains_key(so, tmpkey);  // @TODO - cfati: MARK THIS LINE
            Py_DECREF(tmpkey);
        }
        return rv;
    }
    
  • Line #627:

    static int
    set_contains_key(PySetObject *so, PyObject *key)
    {
        setentry entry;
        Py_hash_t hash;
    
        if (!PyUnicode_CheckExact(key) ||
            (hash = ((PyASCIIObject *) key)->hash) == -1) {  // @TODO - cfati: MARK THIS LINE
            hash = PyObject_Hash(key);
            if (hash == -1)
                return -1;
        }
        entry.key = key;
        entry.hash = hash;
        return set_contains_entry(so, &entry);  // @TODO - cfati: MARK THIS LINE
    }
    
  • Line #614:

    static int
    set_contains_entry(PySetObject *so, setentry *entry)
    {
        PyObject *key;
        setentry *lu_entry;
    
        lu_entry = set_lookkey(so, entry->key, entry->hash);  // @TODO - cfati: MARK THIS LINE
        if (lu_entry == NULL)
            return -1;
        key = lu_entry->key;
        return key != NULL && key != dummy;
    }
    

As seen from the "callstack" (presented in reversed order), in order to test for membership (in / not in), hash is being performed (on all code paths) on the candidate member ("includee"), and since the list instance doesn't have the hash functionality, the interpreter spits out TypeError.

Resolution

There is a number of ways to get around this (as many others already pointed out most of them):

  • Use a container that doesn't require its elements to be hashable (list, tuple)
  • Test for __hash__ member
  • Wrap the membership test in a try / except block
  • Use a a hashable container (tuple) for the element: x = ('a', 'b')

but (generally) these are just ways to get around the problem (this is my personal opinion), since if you end up comparing a list to None and False, the code (that yields that list) could use some refactoring.

like image 199
CristiFati Avatar answered Jan 05 '23 04:01

CristiFati


if you can enter all elements you want to test in a set, it means that all unhashable elements don't belong to your set (because you can't put them in)

You could do:

if x.__hash__ and x in {None,False}:

when object isn't hashable, x.__hash__ is None (other alternatives: Asking "is hashable" about a Python value) and the second part isn't evaluated.

or (better ask forgiveness than permission):

def belongs(x):
    try:
        return x in {None,False}
    except TypeError:   # unhashable type
        return False

both solutions are faster than using a list or tuple ((None,False)), because there's no linear search involved (that is if there are a lot of elements in the test list, not true for only 2 elements)

like image 33
Jean-François Fabre Avatar answered Jan 05 '23 05:01

Jean-François Fabre