Python set().issubset() not working as expected

Tags:

I'm trying to use set().issubset() for comparison of sequences. As you can imagine, it's not working as expected ;) In advance: sorry for the long code-blob.

class T(object):
  def __init__(self, value, attributes = None):
    self.value = value
    self.attributes = Attributes(attributes)

  def __eq__(self, other):
    if not isinstance(other, T):
      return False
    if self.value == other.value and self.attributes == other.attributes:
      return True
    else:
      return False

  def __ne__(self, other):
    if not isinstance(other, T):
      return True
    if self.value != other.value or self.attributes != other.attributes:
      return True
    else:
      return False

class Attributes(dict):
  def __init__(self, attributes):
    super(dict, self)
    self.update(attributes or dict())

  def __eq__(self, other):
    if self.items() == other.items():
      return True
    else:
      return False

  def __ne__(self, other):
    if not self.items() == other.items():
      return True
    else:
      return False

  def __cmp__(self, other):
    return self.items().__cmp__(other.items())


x = [T("I", {'pos': 0}), T("am", {'pos': 1}), T("a", {'pos': 2}), T("test", {'pos': 3})]
y = [T("a", {'pos': 2}), T("test", {'pos': 3})] 
xx = set(x)
yy = set(y)

assert y[0] == x[2], "__eq__ did fail, really?" #works
assert y[1] == x[3], "__eq__ did fail, really?" #works
assert xx-(yy-xx) == xx, "set subtract not working" #works, but is nonsense. see accepted answer
assert not xx.issubset(yy), "i am doing it wrong..." #works
assert yy.issubset(xx), "issubset not working :(" #FAILS!

Running above code fails the last assertion:

$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import issubsettest
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "issubsettest.py", line 52, in <module>
    assert yy.issubset(xx), "issubset not working :("
AssertionError: issubset not working :(
>>>

What am I missing here?

238

asked Oct 19 '12 17:10

konfi

1 Answers

Your objects are being hashed by their id (you didn't override __hash__). Of course they're not subsets since xx and yy contain unique objects.

In order to do this, you need to come up with some sort of __hash__ function. __hash__ should always return the same value for an object which is why it's usually understood that you don't mutate a hashable object. For instance, one choice could be:

class T(object):
    #<snip> ...

    def __hash__(self):
        return hash(self.value)

    #... </snip>

with the understanding that self.value can't change for the life of the object. (Note, I don't claim that is a good choice. The actual hash you use is really dependent on your actual application)

Now the why -- The magic behind sets (and dicts) and the amazing performance is that they rely on hashs. Basically, every hashable object is turned into a "unique" (in a perfect world) number. Python takes that "unique" number and turns it into an array index that it can use to get a handle on the object (the magic here is a bit difficult to explain, but it's not important for this discussion). So, rather than looking for an object by comparing it to all the other objects in the "array" (usually called a table) -- an expensive operation, it knows exactly where to look for the object based on the hash value (cheap). By default, user defined objects are hashed by their id (memory address). When you create the set xx, python looks at each of the objects ids and puts them in a based on their ids. Now when you do xx.issubset(yy), python looks at all of the ids in xx and checks to see if they are all in yy. But none of them are in yy since they're all unique objects (and therefore have unique hash values).

But you say, "why did xx-(yy-xx) == xx work?" Good question. Lets pull this apart.

First, we have (yy - xx). This returns an empty set because again, no elements in xx are also in yy (they all hash to different values since they all have unique ids). So you're really doing

xx - set([]) == xx

which should be pretty obvious why that is True.

140

answered Sep 20 '22 02:09

mgilson

Related questions
                            
                                Why does the Tkinter canvas request 4 extra pixels for the width and height?
                            
                                How to create cgi.FieldStorage for testing purposes?
                            
                                How to create a dict from 2 dictionaries in Python?
                            
                                Scope of variables in python decorators - changing parameters
                            
                                SQLAlchemy column_property basics
                            
                                Most pythonic way of accepting arguments using optparse
                            
                                Killing a script launched in a Process via os.system()
                            
                                Python lxml - get index of tag's text
                            
                                render throws error AttributeError META - (Exception location: __getattr__ in urllib2)
                            
                                Matrix calculations on equidistant elements without for loops
                            
                                Most efficient way to traverse file structure Python
                            
                                Given module m and code object c, what does "exec c in m.__dict__" do?
                            
                                Is there any better way to capture the screen than PIL.ImageGrab.grab()?
                            
                                python __get__ method
                            
                                Generating a directed word graph in python
                            
                                Python's equivalent for Matlab's persistent [duplicate]
                            
                                How to get all YouTube comments with Python's gdata module?
                            
                                Python - open new shell and run command
                            
                                Removing Duplicates from Nested List Based on First 2 Elements
                            
                                Why don't backreferences work in Python's re.sub when using a replacement function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python set().issubset() not working as expected

Tags:

python

set

python-2.7

konfi

People also ask

1 Answers

mgilson

Recent Activity

Donate For Us