I have a list of objects (Foo). A Foo object has several attributes. An instance of a Foo object is equivalent (equal) to another instance of a Foo object iff (if and only if) all the attributes are equal.
I have the following code:
class Foo(object):
def __init__(self, myid):
self.myid=myid
def __eq__(self, other):
if isinstance(other, self.__class__):
print 'DEBUG: self:',self.__dict__
print 'DEBUG: other:',other.__dict__
return self.__dict__ == other.__dict__
else:
print 'DEBUG: ATTEMPT TO COMPARE DIFFERENT CLASSES:',self.__class__,'compared to:', other.__class__
return False
import copy
f1 = Foo(1)
f2 = Foo(2)
f3 = Foo(3)
f4 = Foo(4)
f5 = copy.deepcopy(f3) # overkill here (I know), but needed for my real code
f_list = [f1,f2,f3,f4,f5]
# Surely, there must be a better way? (this dosen't work BTW!)
new_foo_list = list(set(f_list))
I often used this little (anti?) 'pattern' above (converting to set and back), when dealing with simple types (int, float, string - and surprisingly datetime.datetime types), but it has come a cropper with the more involved data type - like Foo above.
So, how could I change the list f1 above into a list of unique items - without having to loop through each item and doing a check on whether it already exists in some temporary cache etc etc?.
What is the most pythonic way to do this?
First, I want to emphasize that using set is certainly not an anti-pattern. sets eliminate duplicates in O(n) time, which is the best you can do, and way better than the naive O(n^2) solution of comparing every item to every other item. It's even better than sorting -- and indeed, it seems your data structure might not even have a natural order, in which case sorting doesn't make a lot of sense.
The problem with using a set in this case is that you have to define a custom __hash__ method. Others have said this. But whether or not you can do so easily is an open question -- it depends on details about your actual class that you haven't told us. For example, if any attributes of a Foo object above are not hashable, then creating a custom hash function is going to be difficult, because you'll have to not only write a custom hash for Foo objects, you'll also have to write custom hashes for every other type of object!
So you need to tell us more about what kinds of attributes your class has if you want a conclusive answer. But I can offer some speculation.
Assuming that a hash function could be written for Foo objects, but also assuming that that Foo objects are mutable and so really shouldn't have a __hash__ method, as Niklas B. points out, here is one workable approach. Create a function freeze that, given a mutable instance of Foo, returns an immutable collection of the data in Foo. So for example, say Foo has a dict and a list in it; freeze returns a tuple containing a tuple of tuples (representing the dict) and another tuple (representing the list). The function freeze should have the following property:
freeze(a) == freeze(b)
If and only if
a == b
Now pass your list through the following code:
dupe_free = dict((freeze(x), x) for x in dupe_list).values()
Now you have a dupe free list in O(n) time. (Indeed, after adding this suggestion, I saw that fraxel suggested something similar; but I think using a custom function -- or even a method -- (x.freeze(), x) -- is the better way to go, rather than relying on __dict__ as he does, which can be unreliable. The same goes for your custom __eq__ method, IMO -- __dict__ is not always a safe shortcut for various reasons I can't get into here.)
Another approach would be to use only immutable objects in the first place! For example, you could use namedtuples. Here's an example stolen from the python docs:
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22) # instantiate with positional or keyword arguments
>>> p[0] + p[1] # indexable like the plain tuple (11, 22)
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> p # readable __repr__ with a name=value style
Point(x=11, y=22)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With