I need a set that uses an identity (<code>is</code>) comparison for <code>add()</code>, as opposed to a value (<code>==</code>) comparison. For example, I have defined a <code>Point</code> class (with immutable x/y) that hooks <code>__eq__</code> and <code>__hash__</code>. <code>Point</code>s compare themselves correctly, and two instances of <code>Point</code> with the same x/y value answer <code>True</code> to <code>==</code> and <code>False</code> to <code>is</code>. I need to add two such instances to my set, and it's important that the result contain both instances even though their x/y values are the same. Some Smalltalks define IdentitySet for this very purpose. I'm wondering, for example, if I can patch the built-in set class to include <code>identityAdd</code> and <code>identityRemove</code> methods. That will probably work, but it seems like a hack. Is there a better way?

Here's an implementation based on an <code>id -> object</code> map (suggested by @nmclean) and <code>collections.MutableSet</code>: <pre class="prettyprint"><code>from collections import MutableSet class IdentitySet(MutableSet): key = id # should return a hashable object def __init__(self, iterable=()): self.map = {} # id -> object self |= iterable # add elements from iterable to the set (union) def __len__(self): # Sized return len(self.map) def __iter__(self): # Iterable return self.map.itervalues() def __contains__(self, x): # Container return self.key(x) in self.map def add(self, value): # MutableSet """Add an element.""" self.map[self.key(value)] = value def discard(self, value): # MutableSet """Remove an element. Do not raise an exception if absent.""" self.map.pop(self.key(value), None) def __repr__(self): if not self: return '%s()' % (self.__class__.__name__,) return '%s(%r)' % (self.__class__.__name__, list(self)) </code></pre> Example: <pre class="prettyprint"><code>a = (1, 2) print IdentitySet([a, (1, 2), a]) # -> IdentitySet([(1, 2), (1, 2)]) # only one instance of `a` print s != Set([a, (1, 2), a]) # it might be unequal because new literal # tuple (1, 2) might have a different id print s | Set([a, (1, 2), a]) # -> IdentitySet([(1, 2), (1, 2), (1, 2)]) # `a` plus two tuples from literals </code></pre> <code>MutableSet</code> automatically provides methods: <code>clear</code>, <code>pop</code>, <code>remove</code>, <code>__ior__</code>, <code>__iand__</code>, <code>__ixor__</code>, <code>__isub__</code>, <code>__le__</code>, <code>__lt__</code>, <code>__eq__</code>, <code>__ne__</code>, <code>__gt__</code>, <code>__ge__</code>, <code>__and__</code>, <code>__or__</code>, <code>__sub__</code>, <code>__xor__</code>, and <code>isdisjoint</code>. Here, unlike in <code>set</code>-based implementations, compound operations always use overridden basic methods e.g., <code>|</code> (<code>self.__or__()</code>, union) is implemented in terms of <code>self.add()</code> and therefore it provides correct for <code>IdentitySet</code> semantics.

You can wrap the Point objects in something that will compare their identities. <pre class="prettyprint"><code>class Ref(object): def __init__(self, value): self.value = value def __eq__(self, other): return self.value is other.value def __hash__(self): return id(self.value) </code></pre> Here is an <code>IdentitySet</code> implementation which wraps its elements in <code>Ref</code> objects: <pre class="prettyprint"><code>from collections import MutableSet class IdentitySet(MutableSet): def __init__(self, items = []): self.refs = set(map(Ref, items)) def __contains__(self, elem): return Ref(elem) in self.refs def __iter__(self): return (ref.value for ref in self.refs) def __len__(self): return len(self.refs) def add(self, elem): self.refs.add(Ref(elem)) def discard(self, elem): self.refs.discard(Ref(elem)) def __repr__(self): return "%s(%s)" % (type(self).__name__, list(self)) </code></pre>

IdentitySet in Python?

Tags:

python

python-2.7

I need a set that uses an identity (is) comparison for add(), as opposed to a value (==) comparison.

For example, I have defined a Point class (with immutable x/y) that hooks __eq__ and __hash__. Points compare themselves correctly, and two instances of Point with the same x/y value answer True to == and False to is. I need to add two such instances to my set, and it's important that the result contain both instances even though their x/y values are the same. Some Smalltalks define IdentitySet for this very purpose.

I'm wondering, for example, if I can patch the built-in set class to include identityAdd and identityRemove methods. That will probably work, but it seems like a hack.

Is there a better way?

281

asked Jun 07 '13 23:06

Tom Stambaugh

3 Answers

Here's an implementation based on an id -> object map (suggested by @nmclean) and collections.MutableSet:

from collections import MutableSet

class IdentitySet(MutableSet):
    key = id  # should return a hashable object

    def __init__(self, iterable=()):
        self.map = {} # id -> object
        self |= iterable  # add elements from iterable to the set (union)

    def __len__(self):  # Sized
        return len(self.map)

    def __iter__(self):  # Iterable
        return self.map.itervalues()

    def __contains__(self, x):  # Container
        return self.key(x) in self.map

    def add(self, value):  # MutableSet
        """Add an element."""
        self.map[self.key(value)] = value

    def discard(self, value):  # MutableSet
        """Remove an element.  Do not raise an exception if absent."""
        self.map.pop(self.key(value), None)

    def __repr__(self):
        if not self:
            return '%s()' % (self.__class__.__name__,)
        return '%s(%r)' % (self.__class__.__name__, list(self))

Example:

a = (1, 2)
print IdentitySet([a, (1, 2), a])
# -> IdentitySet([(1, 2), (1, 2)]) # only one instance of `a`

print s != Set([a, (1, 2), a])  # it might be unequal because new literal
                                # tuple (1, 2) might have a different id
print s | Set([a, (1, 2), a])
# -> IdentitySet([(1, 2), (1, 2), (1, 2)]) # `a` plus two tuples from literals

MutableSet automatically provides methods: clear, pop, remove, __ior__, __iand__, __ixor__, __isub__, __le__, __lt__, __eq__, __ne__, __gt__, __ge__, __and__, __or__, __sub__, __xor__, and isdisjoint.

Here, unlike in set-based implementations, compound operations always use overridden basic methods e.g., | (self.__or__(), union) is implemented in terms of self.add() and therefore it provides correct for IdentitySet semantics.

answered Oct 05 '22 05:10

jfs

You can wrap the Point objects in something that will compare their identities.

class Ref(object):
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        return self.value is other.value

    def __hash__(self):
        return id(self.value)

Here is an IdentitySet implementation which wraps its elements in Ref objects:

from collections import MutableSet

class IdentitySet(MutableSet):
    def __init__(self, items = []):
        self.refs = set(map(Ref, items))

    def __contains__(self, elem):
        return Ref(elem) in self.refs

    def __iter__(self):
        return (ref.value for ref in self.refs)

    def __len__(self):
        return len(self.refs)

    def add(self, elem):
        self.refs.add(Ref(elem))

    def discard(self, elem):
        self.refs.discard(Ref(elem))

    def __repr__(self):
        return "%s(%s)" % (type(self).__name__, list(self))

answered Oct 05 '22 05:10

tom

Well, I don't know if this is the best approach, but how about a dictionary of id => object? i.e.:

def insert_object(obj):
    id_dict[id(obj)] = obj

def contains_object(obj):
    return id_dict.has_key(id(obj))

id returns the true identity (the memory address), so comparing it is essentially equivalent to is.

But, is there really a good reason why you can't just update your eq / hash methods?

EDIT - Here's an idea that combines this with @tom's approach of a set subclass:

class IdentitySet(set):
    def __init__(self, items = []):
        set.__init__(self)
        self._identitydict = {}
        for item in items:
            self.add(item)

    def __contains__(self, elem):
        return set.__contains__(self, id(elem))

    def __iter__(self):
        for identity in set.__iter__(self):
            yield self._identitydict[identity]

    def add(self, elem):
        identity = id(elem)
        set.add(self, identity)
        self._identitydict[identity] = elem

    def discard(self, elem):
        if elem in self:
            self.remove(elem)

    def pop(self):
        return self._identitydict.pop(set.pop(self))

    def remove(self, elem):
        identity = id(elem)
        set.remove(self, identity)
        del self._identitydict[identity]

The set contains ids, and the ids are duplicated as dictionary keys which are mapped to the objects. It's essentially the same approach except that @tom is wrapping the input in a Ref and using id as the Ref's hash, whereas I'm just directly wrapping the input in id. It may be more efficient without the intermediate class.

answered Oct 05 '22 03:10

nmclean

Related questions
                            
                                Check if directory is symlink?
                            
                                What is the meaning of the phrase "registering a callback" with Python APIs? [closed]
                            
                                Why a number like 01 gives a Syntax error in python interactive mode [duplicate]
                            
                                Convert dict into a config file then restore the dict from config
                            
                                How to use a single variable instead of passing multiple Arguments in Python
                            
                                How to not put self. when initializing every attribute in a Python class?
                            
                                How to create a non-instantiable class?
                            
                                How to use user input to end a program?
                            
                                How to Use Sprite Collide in Pygame
                            
                                Efficient way to convert a list to dictionary
                            
                                OptionMenu won't show the first option when clicked (Tkinter)
                            
                                Validating user input strings in Python
                            
                                Is a function call an expression in python?
                            
                                Failed to start devlopment server -- BindError: Unable to find a consistent port localhost
                            
                                Why doesn't Python delete the iterate variable after a loop? [duplicate]
                            
                                is it possible to get all possible urls?
                            
                                sort string by number inside
                            
                                How to format list and dictionary comprehensions
                            
                                Python: Logging all of a class' methods without decorating each one
                            
                                Writing filenames from a folder into a csv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With