Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates in list of object with Python

I've got a list of objects and I've got a db table full of records. My list of objects has a title attribute and I want to remove any objects with duplicate titles from the list (leaving the original).

Then I want to check if my list of objects has any duplicates of any records in the database and if so, remove those items from list before adding them to the database.

I have seen solutions for removing duplicates from a list like this: myList = list(set(myList)), but i'm not sure how to do that with a list of objects?

I need to maintain the order of my list of objects too. I was also thinking maybe I could use difflib to check for differences in the titles.

like image 259
imns Avatar asked Nov 12 '10 21:11

imns


People also ask

How do I remove duplicates from an ordered list in Python?

Using set() A simple and fast approach to remove duplicate elements from list in Python would be to use Python's built-in set() method to convert the list elements into a unique set, following which we can convert it into a List now removed of all its duplicate elements.

How do you extract duplicates from a list in Python?

If you want to extract only duplicate elements from the original list, use collections. Counter() that returns collections. Counter (dictionary subclass) whose key is an element and whose value is its count. Since it is a subclass of a dictionary, you can retrieve keys and values with items() .


2 Answers

The set(list_of_objects) will only remove the duplicates if you know what a duplicate is, that is, you'll need to define a uniqueness of an object.

In order to do that, you'll need to make the object hashable. You need to define both __hash__ and __eq__ method, here is how:

http://docs.python.org/glossary.html#term-hashable

Though, you'll probably only need to define __eq__ method.

EDIT: How to implement the __eq__ method:

You'll need to know, as I mentioned, the uniqueness definition of your object. Supposed we have a Book with attributes author_name and title that their combination is unique, (so, we can have many books Stephen King authored, and many books named The Shining, but only one book named The Shining by Stephen King), then the implementation is as follows:

def __eq__(self, other):     return self.author_name==other.author_name\            and self.title==other.title 

Similarly, this is how I sometimes implement the __hash__ method:

def __hash__(self):     return hash(('title', self.title,                  'author_name', self.author_name)) 

You can check that if you create a list of 2 books with same author and title, the book objects will be the same (with is operator) and equal (with == operator). Also, when set() is used, it will remove one book.

EDIT: This is one old anwser of mine, but I only now notice that it has the error which is corrected with strikethrough in the last paragraph: objects with the same hash() won't give True when compared with is. Hashability of object is used, however, if you intend to use them as elements of set, or as keys in dictionary.

like image 82
vonPetrushev Avatar answered Sep 21 '22 22:09

vonPetrushev


Since they're not hashable, you can't use a set directly. The titles should be though.

Here's the first part.

seen_titles = set() new_list = [] for obj in myList:     if obj.title not in seen_titles:         new_list.append(obj)         seen_titles.add(obj.title) 

You're going to need to describe what database/ORM etc. you're using for the second part though.

like image 39
aaronasterling Avatar answered Sep 25 '22 22:09

aaronasterling