Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Behavior of object in set operations

Tags:

python

object

set

I'm trying to create a custom object that behaves properly in set operations.

I've generally got it working, but I want to make sure I fully understand the implications. In particular, I'm interested in the behavior when there is additional data in the object that is not included in the equal / hash methods. It seems that in the 'intersection' operation, it returns the set of objects that are being compared to, where the 'union' operations returns the set of objects that are being compared.

To illustrate:

class MyObject:
    def __init__(self,value,meta):
        self.value = value
        self.meta = meta
    def __eq__(self,other):
        return self.value == other.value
    def __hash__(self):
        return hash(self.value)

a = MyObject('1','left')
b = MyObject('1','right')
c = MyObject('2','left')
d = MyObject('2','right')
e = MyObject('3','left')
print a == b # True
print a == c # False

for i in set([a,c,e]).intersection(set([b,d])):
    print "%s %s" % (i.value,i.meta)
#returns:
#1 right
#2 right

 for i in set([a,c,e]).union(set([b,d])):
    print "%s %s" % (i.value,i.meta)
#returns:
#1 left
#3 left
#2 left

Is this behavior documented somewhere and deterministic? If so, what is the governing principle?

like image 430
Josh Arenberg Avatar asked Apr 07 '10 18:04

Josh Arenberg


2 Answers

Nope, it's not deterministic. The problem is that you've broken equals' and hash's invariant, that two objects are equivalent when they are equal. Fix your object, don't try to be clever and abuse how set's implementation works. If the meta value is part of MyObject's identity, it should be included in eq and hash.

You can't rely on set's intersection to follow any order, so there is no way to easily do what you want. What you would end up doing is taking the intersection by value only, then look through all your objects for an older one to replace it with, for each one. No nice way to do it algorithmically.

Unions are not so bad:

##fix the eq and hash to work correctly
class MyObject:
    def __init__(self,value,meta):
        self.value = value
        self.meta = meta
    def __eq__(self,other):
        return self.value, self.meta == other.value, other.meta
    def __hash__(self):
        return hash((self.value, self.meta))
    def __repr__(self):
        return "%s %s" % (self.value,self.meta)

a = MyObject('1','left')
b = MyObject('1','right')
c = MyObject('2','left')
d = MyObject('2','right')
e = MyObject('3','left')

union =  set([a,c,e]).union(set([b,d]))
print union
#set([2 left, 2 right, 1 left, 3 left, 1 right])

##sort the objects, so that older objs come before the newer equivalents
sl = sorted(union, key= lambda x: (x.value, x.meta) )
print sl
#[1 left, 1 right, 2 left, 2 right, 3 left]
import itertools
##group the objects by value, groupby needs the objs to be in order to do this
filtered = itertools.groupby(sl, lambda x: x.value)
##make a list of the oldest (first in group)
oldest = [ next(group) for key, group in filtered]
print oldest
#[1 left, 2 left, 3 left]
like image 117
hlfrk414 Avatar answered Oct 22 '22 03:10

hlfrk414


Order doesn't appear to matter:

>>> [ (u.value, u.meta) for u in set([b,d]).intersection(set([a,c,e])) ]
[('1', 'right'), ('2', 'right')]

>>> [ (u.value, u.meta) for u in set([a,c,e]).intersection(set([b,d])) ]
[('1', 'right'), ('2', 'right')]

However, if you do this:

>>> f = MyObject('3', 'right')

And add f to the "right" set:

>>> [ (u.value, u.meta) for u in set([a,c,e]).intersection(set([b,d,f])) ]
[('1', 'right'), ('3', 'right'), ('2', 'right')]

>>> [ (u.value, u.meta) for u in set([b,d,f]).intersection(set([a,c,e])) ]
[('1', 'left'), ('3', 'left'), ('2', 'left')]

So you can see that the behavior depends on the size of the sets (the same effect happens if you union). It may be dependent on other factors as well. I think you're hunting through the python source if you want to know why.

like image 20
Seth Avatar answered Oct 22 '22 04:10

Seth