Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicates in set for objects?

Tags:

python

I have set of objects:

class Test(object):
    def __init__(self):
        self.i = random.randint(1,10)


res = set()

for i in range(0,1000):
    res.add(Test())

print len(res) = 1000

How to remove duplicates from set of objects ?

Thanks for answers, it's work:

class Test(object):
    def __init__(self, i):
        self.i = i
    #   self.i = random.randint(1,10)
    #   self.j = random.randint(1,20)

    def __keys(self):
        t = ()
        for key in self.__dict__:
            t = t + (self.__dict__[key],)
        return t

    def __eq__(self, other):
        return isinstance(other, Test) and self.__keys() == other.__keys()

    def __hash__(self):
        return hash(self.__keys())

res = set()

res.add(Test(2))
...
res.add(Test(8))

result: [2,8,3,4,5,6,7]

but how to save order ? Sets not support order. Can i use list instead set for example ?

like image 545
Bdfy Avatar asked Mar 22 '13 21:03

Bdfy


People also ask

How do you remove duplicates from a Set of objects in Java?

distinct() You can use the distinct() method from the Stream API. The distinct() method return a new Stream without duplicates elements based on the result returned by equals() method, which can be used for further processing.

Does HashSet remove duplicates?

As we know that the HashSet contains only unique elements, ie no duplicate entries are allowed, and since our aim is to remove the duplicate entries from the collection, so for removing all the duplicate entries from the collection, we will use HashSet.

Does Set in python remove duplicates?

set() functionPython set doesn't have duplicate elements.


2 Answers

Your objects must be hashable (i.e. must have __eq__() and __hash__() defined) for sets to work properly with them:

class Test(object):
    def __init__(self):
        self.i = random.randint(1, 10)

    def __eq__(self, other):
        return self.i == other.i

    def __hash__(self):
        return self.i

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

 

If you have several attributes, hash and compare a tuple of them (thanks, delnan):

class Test(object):
    def __init__(self):
        self.i = random.randint(1, 10)
        self.k = random.randint(1, 10)
        self.j = random.randint(1, 10)

    def __eq__(self, other):
        return (self.i, self.k, self.j) == (other.i, other.k, other.j)

    def __hash__(self):
        return hash((self.i, self.k, self.j))
like image 58
Pavel Anossov Avatar answered Oct 04 '22 17:10

Pavel Anossov


Your first question is already answered by Pavel Anossov.

But you have another question:

but how to save order ? Sets not support order. Can i use list instead set for example ?

You can use a list, but there are a few downsides:

  • You get the wrong interface.
  • You don't get automatic handling of duplicates. You have to explicitly write if foo not in res: res.append(foo). Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.
  • It's going to be a lot less efficient if the collection can get large. Basically, adding a new element, checking whether an element already exists, etc. are all going to be O(N) instead of O(1).

What you want is something that works like an ordered set. Or, equivalently, like a list that doesn't allow duplicates.

If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list, then using unique_everseen from the itertools recipes to remove duplicates.

Or you could just keep a set and a list or elements by order (or a list plus a set of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.

Ideally, you want to wrap it up in a type that has exactly the same API as set. Something like an OrderedSet akin to collections.OrderedDict.

Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet recipe at ActiveState.

So, copy it, paste it into your code, then just change res = set() to res = OrderedSet(), and you're done.

like image 29
abarnert Avatar answered Oct 04 '22 16:10

abarnert