Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, remove duplicates from list of tuples

Tags:

python

list

items

I have the following list:

[('mail', 167, datetime.datetime(2010, 9, 29)) , 
 ('name', 1317, datetime.datetime(2011, 12, 12)), 
 ('mail', 1045, datetime.datetime(2010, 8, 13)), 
 ('name', 3, datetime.datetime(2011, 11, 3))]

And I want to remove items from the list with coinciding first item in a tuple where date is not the latest. In other words I need to get this:

[('mail', 167, datetime.datetime(2010, 9, 29)) , 
 ('name', 1317, datetime.datetime(2011, 12, 12))]
like image 220
alexvassel Avatar asked Dec 04 '22 08:12

alexvassel


2 Answers

You can use a dictionary to store the highest value found for a given key so far:

temp = {}
for key, number, date in input_list:
    if key not in temp: # we see this key for the first time
        temp[key] = (key, number, date)
    else:
        if temp[key][2] < date: # the new date is larger than the old one
            temp[key] = (key, number, date)
result = temp.values()
like image 145
Björn Pollex Avatar answered Dec 21 '22 21:12

Björn Pollex


The following approach uses a dictionary to overwrite entries with the same key. Since the list is sorted by the date, older entries get overwritten by newer ones.

temp = {}
for v in sorted(L, key=lambda L: L[2]): # where L is your list
    temp[v[0]] = v
result = temp.values()

Or, for something a lot more compact (but much less readable):

result = dict((v[0],v) for v in sorted(L, key=lambda L: L[2])).values()

Update

This method would be reasonably quick if the list is already (or mostly) sorted by date. If it isn't, and especially if it is a large list, then this may not be the best approach.

For unsorted lists, you will likely get a some performance improvement by sorting by the key first, then the date. i.e. sorted(L, key=lambda L: (L[0],L[2])).

Or, better yet, go for Space_C0wb0y's answer.

like image 24
Shawn Chin Avatar answered Dec 21 '22 20:12

Shawn Chin