Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing duplicates using custom comparisons

The most convenient, "Pythonic" way to remove duplicates from a list is basically:

mylist = list(set(mylist))

But suppose your criteria for counting a duplicate depends on a particular member field of the objects contained in mylist.

Well, one solution is to just define __eq__ and __hash__ for the objects in mylist, and then the classic list(set(mylist)) will work.

But sometimes you have requirements that call for a bit more flexibility. It would be very convenient to be able to create on-the-fly lambdas to use custom comparison routines to identify duplicates in different ways. Ideally, something like:

mylist = list(set(mylist, key = lambda x: x.firstname))

Of course, that doesn't actually work because the set constructor doesn't take a compare function, and set requires hashable keys as well.

So what's the closest way to achieve something like that, so that you can remove duplicates using arbitrary comparison functions?

like image 306
Channel72 Avatar asked Oct 04 '12 15:10

Channel72


2 Answers

You can use a dict instead of a set, where the dict's keys will be the unique values:

d = {x.firstname: x for x in mylist}
mylist = list(d.values())
like image 109
interjay Avatar answered Oct 11 '22 13:10

interjay


I would do this:

duplicates = set()
newlist = []
for item in mylist:
    if item.firstname not in duplicates:
        newlist.append(item)
        excludes.add(item.firstname)
like image 29
Tim Pietzcker Avatar answered Oct 11 '22 13:10

Tim Pietzcker