Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One-liner to remove duplicates, keep ordering of list [duplicate]

Tags:

python

I have the following list:

['Herb', 'Alec', 'Herb', 'Don']

I want to remove duplicates while keeping the order, so it would be :

['Herb', 'Alec', 'Don']

Here is how I would do this verbosely:

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

Is there a way to do this in a single line?

like image 865
David542 Avatar asked Aug 17 '17 23:08

David542


People also ask

Does remove duplicates remove the first or second?

When duplicates are removed, the first occurrence of the value in the list is kept, but other identical values are deleted. Because you are permanently deleting data, it's a good idea to copy the original range of cells or table to another worksheet or workbook before removing duplicate values.


Video Answer


1 Answers

Using pandas, create a series from the list, drop duplicates, and then convert it back to a list.

import pandas as pd

>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']

Timings

Solution from @StefanPochmann is the clear winner for lists with high duplication.

my_list = ['Herb', 'Alec', 'Don'] * 10000

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 µs per loop

For larger lists with no duplication (e.g. simply a range of numbers), the pandas solution is very fast.

my_list = range(10000)

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop
like image 87
Alexander Avatar answered Sep 20 '22 12:09

Alexander