The most convenient, "Pythonic" way to remove duplicates from a list is basically:
mylist = list(set(mylist))
But suppose your criteria for counting a duplicate depends on a particular member field of the objects contained in mylist
.
Well, one solution is to just define __eq__
and __hash__
for the objects in mylist
, and then the classic list(set(mylist))
will work.
But sometimes you have requirements that call for a bit more flexibility. It would be very convenient to be able to create on-the-fly lambdas to use custom comparison routines to identify duplicates in different ways. Ideally, something like:
mylist = list(set(mylist, key = lambda x: x.firstname))
Of course, that doesn't actually work because the set
constructor doesn't take a compare function, and set
requires hashable keys as well.
So what's the closest way to achieve something like that, so that you can remove duplicates using arbitrary comparison functions?
You can use a dict instead of a set, where the dict's keys will be the unique values:
d = {x.firstname: x for x in mylist}
mylist = list(d.values())
I would do this:
duplicates = set()
newlist = []
for item in mylist:
if item.firstname not in duplicates:
newlist.append(item)
excludes.add(item.firstname)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With