I'm trying to use set().issubset()
for comparison of sequences. As you can imagine, it's not working as expected ;) In advance: sorry for the long code-blob.
class T(object):
def __init__(self, value, attributes = None):
self.value = value
self.attributes = Attributes(attributes)
def __eq__(self, other):
if not isinstance(other, T):
return False
if self.value == other.value and self.attributes == other.attributes:
return True
else:
return False
def __ne__(self, other):
if not isinstance(other, T):
return True
if self.value != other.value or self.attributes != other.attributes:
return True
else:
return False
class Attributes(dict):
def __init__(self, attributes):
super(dict, self)
self.update(attributes or dict())
def __eq__(self, other):
if self.items() == other.items():
return True
else:
return False
def __ne__(self, other):
if not self.items() == other.items():
return True
else:
return False
def __cmp__(self, other):
return self.items().__cmp__(other.items())
x = [T("I", {'pos': 0}), T("am", {'pos': 1}), T("a", {'pos': 2}), T("test", {'pos': 3})]
y = [T("a", {'pos': 2}), T("test", {'pos': 3})]
xx = set(x)
yy = set(y)
assert y[0] == x[2], "__eq__ did fail, really?" #works
assert y[1] == x[3], "__eq__ did fail, really?" #works
assert xx-(yy-xx) == xx, "set subtract not working" #works, but is nonsense. see accepted answer
assert not xx.issubset(yy), "i am doing it wrong..." #works
assert yy.issubset(xx), "issubset not working :(" #FAILS!
Running above code fails the last assertion:
$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import issubsettest
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "issubsettest.py", line 52, in <module>
assert yy.issubset(xx), "issubset not working :("
AssertionError: issubset not working :(
>>>
What am I missing here?
Python Set issubset() MethodThe issubset() method returns True if all items in the set exists in the specified set, otherwise it retuns False.
Python Set issubset() The issubset() method returns True if set A is the subset of B , i.e. if all the elements of set A are present in set B . Else, it returns False .
A set is always a subset of itself. A is a proper subset of B if the sets are not equal and set A is a subset of B.
Your objects are being hashed by their id
(you didn't override __hash__
). Of course they're not subsets since xx
and yy
contain unique objects.
In order to do this, you need to come up with some sort of __hash__
function. __hash__
should always return the same value for an object which is why it's usually understood that you don't mutate a hashable object. For instance, one choice could be:
class T(object):
#<snip> ...
def __hash__(self):
return hash(self.value)
#... </snip>
with the understanding that self.value
can't change for the life of the object. (Note, I don't claim that is a good choice. The actual hash you use is really dependent on your actual application)
Now the why -- The magic behind sets (and dicts) and the amazing performance is that they rely on hashs. Basically, every hashable object is turned into a "unique" (in a perfect world) number. Python takes that "unique" number and turns it into an array index that it can use to get a handle on the object (the magic here is a bit difficult to explain, but it's not important for this discussion). So, rather than looking for an object by comparing it to all the other objects in the "array" (usually called a table) -- an expensive operation, it knows exactly where to look for the object based on the hash value (cheap). By default, user defined objects are hashed by their id
(memory address). When you create the set xx
, python looks at each of the objects id
s and puts them in a based on their ids. Now when you do xx.issubset(yy)
, python looks at all of the ids in xx
and checks to see if they are all in yy
. But none of them are in yy
since they're all unique objects (and therefore have unique hash values).
But you say, "why did xx-(yy-xx) == xx
work?" Good question. Lets pull this apart.
First, we have (yy - xx)
. This returns an empty set because again, no elements in xx
are also in yy
(they all hash to different values since they all have unique id
s). So you're really doing
xx - set([]) == xx
which should be pretty obvious why that is True
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With