Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python set contains vs. list contains

i'm using python 2.7

consider the following snippet of code (the example is contrived):

import datetime

class ScheduleData:
    def __init__(self, date):
        self.date = date

    def __eq__(self, other):
        try:
            return self.date == other.date
        except AttributeError as e:
            return self.date == other

    def __hash__(self):
        return hash(self.date)



schedule_set = set()
schedule_set.add(ScheduleData(datetime.date(2010, 8, 7)))
schedule_set.add(ScheduleData(datetime.date(2010, 8, 8)))
schedule_set.add(ScheduleData(datetime.date(2010, 8, 9)))

print (datetime.date(2010, 8, 8) in schedule_set)

schedule_list = list(schedule_set)

print (datetime.date(2010, 8, 8) in schedule_list)

the output from this is unexpected (to me, at least):

[08:02 PM toolscripts]$ python test.py
True
False

in the first case, the given date is found in the schedule_set as i have overridden the __hash__ and __eq__ functions.

from my understanding the in operator will check against hash and equality for sets, but for lists it will simply iterate over the items in the list and check equality.

so what is happening here? why does my second test for in on the list schedule_list fail?

do i have to override some other function for lists?

like image 949
randomfigure Avatar asked Nov 26 '13 04:11

randomfigure


People also ask

What is the difference between a list and a set in Python?

Lists and tuples are standard Python data types that store values in a sequence. Sets are another standard Python data type that also store values. The major difference is that sets, unlike lists or tuples, cannot have multiple occurrences of the same element and store unordered values.

Is set faster than list in Python?

Generally the lists are faster than sets. But in the case of searching for an element in a collection, sets are faster because sets have been implemented using hash tables. So basically Python does not have to search the full set, which means that the time complexity in average is O(1).

Can a Python set contain lists?

Therefore, Python does not allow a set to store a list. You cannot add a list to a set. A set is an unordered collection of distinct hashable objects.

Why are sets more efficient than lists?

Sets cannot contain duplicates, and they will simply disappear. Sets use hashing to perform look ups which makes them way faster than lists in this regard.


2 Answers

@RyanHaining is correct. For a truly bizarre workaround, add this method to your class:

def timetuple(self):
    return None

Then your program will print True twice. The reasons for this are involved, having to do with an unfortunate history of comparisons in Python 2 being far too loose. The timetuple() workaround is mostly explained in this part of the docs:

Note In order to stop comparison from falling back to the default scheme of comparing object addresses, datetime comparison normally raises TypeError if the other comparand isn’t also a datetime object. However, NotImplemented is returned instead if the other comparand has a timetuple() attribute. This hook gives other kinds of date objects a chance at implementing mixed-type comparison. If not, when a datetime object is compared to an object of a different type, TypeError is raised unless the comparison is == or !=. The latter cases return False or True, respectively.

datetime was one of the first types added to Python that tried to offer less surprising comparison behavior. But, it couldn't become "really clean" until Python 3.

like image 44
Tim Peters Avatar answered Sep 30 '22 19:09

Tim Peters


The issue is the comparison is invoking an __eq__ function opposite of what you're looking for. The __eq__ method defined works when you have a ScheduleData() == datetime.date() but the in operator is performing the comparison in the opposite order, datetime.date() == ScheduleData() which is not invoking your defined __eq__. Only the class acting as the left-hand side will have its __eq__ called.

The reason this problem occurs in python 2 and not 3 has to do with the definition of datetime.date.__eq__ in the std library. Take for example the following two classes:

class A(object):
    def __eq__(self, other):
        print ('A.__eq__')
        return False

class B(object):
    def __eq__(self, other):
        print ('B.__eq__')

items = [A()]
B() in items

Running this code prints B.__eq__ under both Python 2 and Python 3. The B object is used as the lhs, just as your datetime.date object is used in Python 2. However, if I redefine B.__eq__ to resemble the Python 3 defintion of datetime.date.__eq__:

class B(object):
    def __eq__(self, other):
        print ('First B.__eq__')
        if isinstance(self, other.__class__):
            print ('B.__eq__')
        return NotImplemented

Then:

First B.__eq__
A.__eq__ 

is printed under both Python 2 and 3. The return of NotImplemented causes the check with the arguments reversed.

Using timetuple in your class will fix this problem, as @TimPeters stated (interesting quirk I was unaware of), though it seems that it need not be a function

class ScheduleData:
    timetuple = None

is all you'd need in addition to what you have already.

like image 142
Ryan Haining Avatar answered Sep 30 '22 21:09

Ryan Haining