Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IdentitySet in Python?

I need a set that uses an identity (is) comparison for add(), as opposed to a value (==) comparison.

For example, I have defined a Point class (with immutable x/y) that hooks __eq__ and __hash__. Points compare themselves correctly, and two instances of Point with the same x/y value answer True to == and False to is. I need to add two such instances to my set, and it's important that the result contain both instances even though their x/y values are the same. Some Smalltalks define IdentitySet for this very purpose.

I'm wondering, for example, if I can patch the built-in set class to include identityAdd and identityRemove methods. That will probably work, but it seems like a hack.

Is there a better way?

like image 281
Tom Stambaugh Avatar asked Jun 07 '13 23:06

Tom Stambaugh


People also ask

What is an identity in Python?

Python id() function returns the “identity” of the object. The identity of an object is an integer, which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

What is object in Python?

An object is simply a collection of data (variables) and methods (functions) that act on those data. Similarly, a class is a blueprint for that object.


3 Answers

Here's an implementation based on an id -> object map (suggested by @nmclean) and collections.MutableSet:

from collections import MutableSet

class IdentitySet(MutableSet):
    key = id  # should return a hashable object

    def __init__(self, iterable=()):
        self.map = {} # id -> object
        self |= iterable  # add elements from iterable to the set (union)

    def __len__(self):  # Sized
        return len(self.map)

    def __iter__(self):  # Iterable
        return self.map.itervalues()

    def __contains__(self, x):  # Container
        return self.key(x) in self.map

    def add(self, value):  # MutableSet
        """Add an element."""
        self.map[self.key(value)] = value

    def discard(self, value):  # MutableSet
        """Remove an element.  Do not raise an exception if absent."""
        self.map.pop(self.key(value), None)

    def __repr__(self):
        if not self:
            return '%s()' % (self.__class__.__name__,)
        return '%s(%r)' % (self.__class__.__name__, list(self))

Example:

a = (1, 2)
print IdentitySet([a, (1, 2), a])
# -> IdentitySet([(1, 2), (1, 2)]) # only one instance of `a`

print s != Set([a, (1, 2), a])  # it might be unequal because new literal
                                # tuple (1, 2) might have a different id
print s | Set([a, (1, 2), a])
# -> IdentitySet([(1, 2), (1, 2), (1, 2)]) # `a` plus two tuples from literals

MutableSet automatically provides methods: clear, pop, remove, __ior__, __iand__, __ixor__, __isub__, __le__, __lt__, __eq__, __ne__, __gt__, __ge__, __and__, __or__, __sub__, __xor__, and isdisjoint.

Here, unlike in set-based implementations, compound operations always use overridden basic methods e.g., | (self.__or__(), union) is implemented in terms of self.add() and therefore it provides correct for IdentitySet semantics.

like image 66
jfs Avatar answered Oct 05 '22 05:10

jfs


You can wrap the Point objects in something that will compare their identities.

class Ref(object):
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        return self.value is other.value

    def __hash__(self):
        return id(self.value)

Here is an IdentitySet implementation which wraps its elements in Ref objects:

from collections import MutableSet

class IdentitySet(MutableSet):
    def __init__(self, items = []):
        self.refs = set(map(Ref, items))

    def __contains__(self, elem):
        return Ref(elem) in self.refs

    def __iter__(self):
        return (ref.value for ref in self.refs)

    def __len__(self):
        return len(self.refs)

    def add(self, elem):
        self.refs.add(Ref(elem))

    def discard(self, elem):
        self.refs.discard(Ref(elem))

    def __repr__(self):
        return "%s(%s)" % (type(self).__name__, list(self))
like image 22
tom Avatar answered Oct 05 '22 05:10

tom


Well, I don't know if this is the best approach, but how about a dictionary of id => object? i.e.:

def insert_object(obj):
    id_dict[id(obj)] = obj

def contains_object(obj):
    return id_dict.has_key(id(obj))

id returns the true identity (the memory address), so comparing it is essentially equivalent to is.

But, is there really a good reason why you can't just update your eq / hash methods?

EDIT - Here's an idea that combines this with @tom's approach of a set subclass:

class IdentitySet(set):
    def __init__(self, items = []):
        set.__init__(self)
        self._identitydict = {}
        for item in items:
            self.add(item)

    def __contains__(self, elem):
        return set.__contains__(self, id(elem))

    def __iter__(self):
        for identity in set.__iter__(self):
            yield self._identitydict[identity]

    def add(self, elem):
        identity = id(elem)
        set.add(self, identity)
        self._identitydict[identity] = elem

    def discard(self, elem):
        if elem in self:
            self.remove(elem)

    def pop(self):
        return self._identitydict.pop(set.pop(self))

    def remove(self, elem):
        identity = id(elem)
        set.remove(self, identity)
        del self._identitydict[identity]

The set contains ids, and the ids are duplicated as dictionary keys which are mapped to the objects. It's essentially the same approach except that @tom is wrapping the input in a Ref and using id as the Ref's hash, whereas I'm just directly wrapping the input in id. It may be more efficient without the intermediate class.

like image 36
nmclean Avatar answered Oct 05 '22 03:10

nmclean