Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python set().issubset() not working as expected

I'm trying to use set().issubset() for comparison of sequences. As you can imagine, it's not working as expected ;) In advance: sorry for the long code-blob.

class T(object):
  def __init__(self, value, attributes = None):
    self.value = value
    self.attributes = Attributes(attributes)

  def __eq__(self, other):
    if not isinstance(other, T):
      return False
    if self.value == other.value and self.attributes == other.attributes:
      return True
    else:
      return False

  def __ne__(self, other):
    if not isinstance(other, T):
      return True
    if self.value != other.value or self.attributes != other.attributes:
      return True
    else:
      return False

class Attributes(dict):
  def __init__(self, attributes):
    super(dict, self)
    self.update(attributes or dict())

  def __eq__(self, other):
    if self.items() == other.items():
      return True
    else:
      return False

  def __ne__(self, other):
    if not self.items() == other.items():
      return True
    else:
      return False

  def __cmp__(self, other):
    return self.items().__cmp__(other.items())


x = [T("I", {'pos': 0}), T("am", {'pos': 1}), T("a", {'pos': 2}), T("test", {'pos': 3})]
y = [T("a", {'pos': 2}), T("test", {'pos': 3})] 
xx = set(x)
yy = set(y)

assert y[0] == x[2], "__eq__ did fail, really?" #works
assert y[1] == x[3], "__eq__ did fail, really?" #works
assert xx-(yy-xx) == xx, "set subtract not working" #works, but is nonsense. see accepted answer
assert not xx.issubset(yy), "i am doing it wrong..." #works
assert yy.issubset(xx), "issubset not working :(" #FAILS!

Running above code fails the last assertion:

$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import issubsettest
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "issubsettest.py", line 52, in <module>
    assert yy.issubset(xx), "issubset not working :("
AssertionError: issubset not working :(
>>> 

What am I missing here?

like image 238
konfi Avatar asked Oct 19 '12 17:10

konfi


People also ask

How do I use Issubset in Python?

Python Set issubset() MethodThe issubset() method returns True if all items in the set exists in the specified set, otherwise it retuns False.

How do you check if set A contains every element of set B Python?

Python Set issubset() The issubset() method returns True if set A is the subset of B , i.e. if all the elements of set A are present in set B . Else, it returns False .

Is a set a subset of itself Python?

A set is always a subset of itself. A is a proper subset of B if the sets are not equal and set A is a subset of B.


1 Answers

Your objects are being hashed by their id (you didn't override __hash__). Of course they're not subsets since xx and yy contain unique objects.

In order to do this, you need to come up with some sort of __hash__ function. __hash__ should always return the same value for an object which is why it's usually understood that you don't mutate a hashable object. For instance, one choice could be:

class T(object):
    #<snip> ...

    def __hash__(self):
        return hash(self.value)

    #... </snip>

with the understanding that self.value can't change for the life of the object. (Note, I don't claim that is a good choice. The actual hash you use is really dependent on your actual application)


Now the why -- The magic behind sets (and dicts) and the amazing performance is that they rely on hashs. Basically, every hashable object is turned into a "unique" (in a perfect world) number. Python takes that "unique" number and turns it into an array index that it can use to get a handle on the object (the magic here is a bit difficult to explain, but it's not important for this discussion). So, rather than looking for an object by comparing it to all the other objects in the "array" (usually called a table) -- an expensive operation, it knows exactly where to look for the object based on the hash value (cheap). By default, user defined objects are hashed by their id (memory address). When you create the set xx, python looks at each of the objects ids and puts them in a based on their ids. Now when you do xx.issubset(yy), python looks at all of the ids in xx and checks to see if they are all in yy. But none of them are in yy since they're all unique objects (and therefore have unique hash values).

But you say, "why did xx-(yy-xx) == xx work?" Good question. Lets pull this apart.

First, we have (yy - xx). This returns an empty set because again, no elements in xx are also in yy (they all hash to different values since they all have unique ids). So you're really doing

xx - set([]) == xx

which should be pretty obvious why that is True.

like image 140
mgilson Avatar answered Sep 20 '22 02:09

mgilson