I have set of objects:
class Test(object):
def __init__(self):
self.i = random.randint(1,10)
res = set()
for i in range(0,1000):
res.add(Test())
print len(res) = 1000
How to remove duplicates from set of objects ?
Thanks for answers, it's work:
class Test(object):
def __init__(self, i):
self.i = i
# self.i = random.randint(1,10)
# self.j = random.randint(1,20)
def __keys(self):
t = ()
for key in self.__dict__:
t = t + (self.__dict__[key],)
return t
def __eq__(self, other):
return isinstance(other, Test) and self.__keys() == other.__keys()
def __hash__(self):
return hash(self.__keys())
res = set()
res.add(Test(2))
...
res.add(Test(8))
result: [2,8,3,4,5,6,7]
but how to save order ? Sets not support order. Can i use list instead set for example ?
distinct() You can use the distinct() method from the Stream API. The distinct() method return a new Stream without duplicates elements based on the result returned by equals() method, which can be used for further processing.
As we know that the HashSet contains only unique elements, ie no duplicate entries are allowed, and since our aim is to remove the duplicate entries from the collection, so for removing all the duplicate entries from the collection, we will use HashSet.
set() functionPython set doesn't have duplicate elements.
Your objects must be hashable (i.e. must have __eq__()
and __hash__()
defined) for sets to work properly with them:
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
def __eq__(self, other):
return self.i == other.i
def __hash__(self):
return self.i
An object is hashable if it has a hash value which never changes during its lifetime (it needs a
__hash__()
method), and can be compared to other objects (it needs an__eq__()
or__cmp__()
method). Hashable objects which compare equal must have the same hash value.Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
If you have several attributes, hash and compare a tuple of them (thanks, delnan
):
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
self.k = random.randint(1, 10)
self.j = random.randint(1, 10)
def __eq__(self, other):
return (self.i, self.k, self.j) == (other.i, other.k, other.j)
def __hash__(self):
return hash((self.i, self.k, self.j))
Your first question is already answered by Pavel Anossov.
But you have another question:
but how to save order ? Sets not support order. Can i use list instead set for example ?
You can use a list
, but there are a few downsides:
if foo not in res: res.append(foo)
. Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.What you want is something that works like an ordered set
. Or, equivalently, like a list
that doesn't allow duplicates.
If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list
, then using unique_everseen
from the itertools
recipes to remove duplicates.
Or you could just keep a set
and a list
or elements by order (or a list
plus a set
of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.
Ideally, you want to wrap it up in a type that has exactly the same API as set
. Something like an OrderedSet
akin to collections.OrderedDict
.
Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet
recipe at ActiveState.
So, copy it, paste it into your code, then just change res = set()
to res = OrderedSet()
, and you're done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With