Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique *while preserving order*? [duplicate]

For example:

>>> x = [1, 1, 2, 'a', 'a', 3] >>> unique(x) [1, 2, 'a', 3] 

Assume list elements are hashable.

Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].

like image 888
J Miller Avatar asked Sep 18 '08 01:09

J Miller


People also ask

How do I remove duplicates from a list and keep it in Python?

If you want to preserve the order while you remove duplicate elements from List in Python, you can use the OrderedDict class from the collections module. More specifically, we can use OrderedDict. fromkeys(list) to obtain a dictionary having duplicate elements removed, while still maintaining order.


2 Answers

def unique(items):     found = set()     keep = []      for item in items:         if item not in found:             found.add(item)             keep.append(item)                  return keep  print unique([1, 1, 2, 'a', 'a', 3]) 
like image 96
Terhorst Avatar answered Sep 20 '22 04:09

Terhorst


Using:

lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5] 

And using the timeit module:

$ python -m timeit -s 'import uniquetest' 'uniquetest.etchasketch(uniquetest.lst)' 

And so on for the various other functions (which I named after their posters), I have the following results (on my first generation Intel MacBook Pro):

Allen:                  14.6 µs per loop [1] Terhorst:               26.6 µs per loop Tarle:                  44.7 µs per loop ctcherry:               44.8 µs per loop Etchasketch 1 (short):  64.6 µs per loop Schinckel:              65.0 µs per loop Etchasketch 2:          71.6 µs per loop Little:                 89.4 µs per loop Tyler:                 179.0 µs per loop 

[1] Note that Allen modifies the list in place – I believe this has skewed the time, in that the timeit module runs the code 100000 times and 99999 of them are with the dupe-less list.


Summary: Straight-forward implementation with sets wins over confusing one-liners :-)

like image 42
John Fouhy Avatar answered Sep 21 '22 04:09

John Fouhy