I need a set that uses an identity (is
) comparison for add()
, as opposed to a value (==
) comparison.
For example, I have defined a Point
class (with immutable x/y) that hooks __eq__
and __hash__
. Point
s compare themselves correctly, and two instances of Point
with the same x/y value answer True
to ==
and False
to is
. I need to add two such instances to my set, and it's important that the result contain both instances even though their x/y values are the same. Some Smalltalks define IdentitySet for this very purpose.
I'm wondering, for example, if I can patch the built-in set class to include identityAdd
and identityRemove
methods. That will probably work, but it seems like a hack.
Is there a better way?
Python id() function returns the “identity” of the object. The identity of an object is an integer, which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
An object is simply a collection of data (variables) and methods (functions) that act on those data. Similarly, a class is a blueprint for that object.
Here's an implementation based on an id -> object
map (suggested by @nmclean) and collections.MutableSet
:
from collections import MutableSet
class IdentitySet(MutableSet):
key = id # should return a hashable object
def __init__(self, iterable=()):
self.map = {} # id -> object
self |= iterable # add elements from iterable to the set (union)
def __len__(self): # Sized
return len(self.map)
def __iter__(self): # Iterable
return self.map.itervalues()
def __contains__(self, x): # Container
return self.key(x) in self.map
def add(self, value): # MutableSet
"""Add an element."""
self.map[self.key(value)] = value
def discard(self, value): # MutableSet
"""Remove an element. Do not raise an exception if absent."""
self.map.pop(self.key(value), None)
def __repr__(self):
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self))
Example:
a = (1, 2)
print IdentitySet([a, (1, 2), a])
# -> IdentitySet([(1, 2), (1, 2)]) # only one instance of `a`
print s != Set([a, (1, 2), a]) # it might be unequal because new literal
# tuple (1, 2) might have a different id
print s | Set([a, (1, 2), a])
# -> IdentitySet([(1, 2), (1, 2), (1, 2)]) # `a` plus two tuples from literals
MutableSet
automatically provides methods: clear
, pop
, remove
, __ior__
, __iand__
, __ixor__
, __isub__
, __le__
, __lt__
, __eq__
, __ne__
, __gt__
, __ge__
, __and__
, __or__
, __sub__
, __xor__
, and isdisjoint
.
Here, unlike in set
-based implementations, compound operations always use overridden basic methods e.g., |
(self.__or__()
, union) is implemented in terms of self.add()
and therefore it provides correct for IdentitySet
semantics.
You can wrap the Point objects in something that will compare their identities.
class Ref(object):
def __init__(self, value):
self.value = value
def __eq__(self, other):
return self.value is other.value
def __hash__(self):
return id(self.value)
Here is an IdentitySet
implementation which wraps its elements in Ref
objects:
from collections import MutableSet
class IdentitySet(MutableSet):
def __init__(self, items = []):
self.refs = set(map(Ref, items))
def __contains__(self, elem):
return Ref(elem) in self.refs
def __iter__(self):
return (ref.value for ref in self.refs)
def __len__(self):
return len(self.refs)
def add(self, elem):
self.refs.add(Ref(elem))
def discard(self, elem):
self.refs.discard(Ref(elem))
def __repr__(self):
return "%s(%s)" % (type(self).__name__, list(self))
Well, I don't know if this is the best approach, but how about a dictionary of id => object? i.e.:
def insert_object(obj):
id_dict[id(obj)] = obj
def contains_object(obj):
return id_dict.has_key(id(obj))
id
returns the true identity (the memory address), so comparing it is essentially equivalent to is
.
But, is there really a good reason why you can't just update your eq / hash methods?
EDIT - Here's an idea that combines this with @tom's approach of a set
subclass:
class IdentitySet(set):
def __init__(self, items = []):
set.__init__(self)
self._identitydict = {}
for item in items:
self.add(item)
def __contains__(self, elem):
return set.__contains__(self, id(elem))
def __iter__(self):
for identity in set.__iter__(self):
yield self._identitydict[identity]
def add(self, elem):
identity = id(elem)
set.add(self, identity)
self._identitydict[identity] = elem
def discard(self, elem):
if elem in self:
self.remove(elem)
def pop(self):
return self._identitydict.pop(set.pop(self))
def remove(self, elem):
identity = id(elem)
set.remove(self, identity)
del self._identitydict[identity]
The set contains ids, and the ids are duplicated as dictionary keys which are mapped to the objects. It's essentially the same approach except that @tom is wrapping the input in a Ref
and using id
as the Ref
's hash, whereas I'm just directly wrapping the input in id
. It may be more efficient without the intermediate class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With