i'm using python 2.7
consider the following snippet of code (the example is contrived):
import datetime
class ScheduleData:
def __init__(self, date):
self.date = date
def __eq__(self, other):
try:
return self.date == other.date
except AttributeError as e:
return self.date == other
def __hash__(self):
return hash(self.date)
schedule_set = set()
schedule_set.add(ScheduleData(datetime.date(2010, 8, 7)))
schedule_set.add(ScheduleData(datetime.date(2010, 8, 8)))
schedule_set.add(ScheduleData(datetime.date(2010, 8, 9)))
print (datetime.date(2010, 8, 8) in schedule_set)
schedule_list = list(schedule_set)
print (datetime.date(2010, 8, 8) in schedule_list)
the output from this is unexpected (to me, at least):
[08:02 PM toolscripts]$ python test.py
True
False
in the first case, the given date is found in the schedule_set
as i have overridden the __hash__
and __eq__
functions.
from my understanding the in
operator will check against hash and equality for sets, but for lists it will simply iterate over the items in the list and check equality.
so what is happening here? why does my second test for in
on the list schedule_list
fail?
do i have to override some other function for lists?
Lists and tuples are standard Python data types that store values in a sequence. Sets are another standard Python data type that also store values. The major difference is that sets, unlike lists or tuples, cannot have multiple occurrences of the same element and store unordered values.
Generally the lists are faster than sets. But in the case of searching for an element in a collection, sets are faster because sets have been implemented using hash tables. So basically Python does not have to search the full set, which means that the time complexity in average is O(1).
Therefore, Python does not allow a set to store a list. You cannot add a list to a set. A set is an unordered collection of distinct hashable objects.
Sets cannot contain duplicates, and they will simply disappear. Sets use hashing to perform look ups which makes them way faster than lists in this regard.
@RyanHaining is correct. For a truly bizarre workaround, add this method to your class:
def timetuple(self):
return None
Then your program will print True
twice. The reasons for this are involved, having to do with an unfortunate history of comparisons in Python 2 being far too loose. The timetuple()
workaround is mostly explained in this part of the docs:
Note In order to stop comparison from falling back to the default scheme of comparing object addresses, datetime comparison normally raises TypeError if the other comparand isn’t also a datetime object. However, NotImplemented is returned instead if the other comparand has a timetuple() attribute. This hook gives other kinds of date objects a chance at implementing mixed-type comparison. If not, when a datetime object is compared to an object of a different type, TypeError is raised unless the comparison is == or !=. The latter cases return False or True, respectively.
datetime
was one of the first types added to Python that tried to offer less surprising comparison behavior. But, it couldn't become "really clean" until Python 3.
The issue is the comparison is invoking an __eq__
function opposite of what you're looking for. The __eq__
method defined works when you have a ScheduleData() == datetime.date()
but the in
operator is performing the comparison in the opposite order, datetime.date() == ScheduleData()
which is not invoking your defined __eq__
. Only the class acting as the left-hand side will have its __eq__
called.
The reason this problem occurs in python 2 and not 3 has to do with the definition of datetime.date.__eq__
in the std library. Take for example the following two classes:
class A(object):
def __eq__(self, other):
print ('A.__eq__')
return False
class B(object):
def __eq__(self, other):
print ('B.__eq__')
items = [A()]
B() in items
Running this code prints B.__eq__
under both Python 2 and Python 3. The B
object is used as the lhs, just as your datetime.date
object is used in Python 2. However, if I redefine B.__eq__
to resemble the Python 3 defintion of datetime.date.__eq__
:
class B(object):
def __eq__(self, other):
print ('First B.__eq__')
if isinstance(self, other.__class__):
print ('B.__eq__')
return NotImplemented
Then:
First B.__eq__
A.__eq__
is printed under both Python 2 and 3. The return of NotImplemented
causes the check with the arguments reversed.
Using timetuple
in your class will fix this problem, as @TimPeters stated (interesting quirk I was unaware of), though it seems that it need not be a function
class ScheduleData:
timetuple = None
is all you'd need in addition to what you have already.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With