Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest Way To Remove Duplicates In Lists Python

I have two very large lists and to loop through it once takes at least a second and I need to do it 200,000 times. What's the fastest way to remove duplicates in two lists to form one?

like image 366
Cookies Avatar asked Nov 04 '09 17:11

Cookies


1 Answers

This is the fastest way I can think of:

import itertools
output_list = list(set(itertools.chain(first_list, second_list)))

Slight update: As jcd points out, depending on your application, you probably don't need to convert the result back to a list. Since a set is iterable by itself, you might be able to just use it directly:

output_set = set(itertools.chain(first_list, second_list))
for item in output_set:
    # do something

Beware though that any solution involving the use of set() will probably reorder the elements in your list, so there's no guarantee that elements will be in any particular order. That said, since you're combining two lists, it's hard to come up with a good reason why you would need a particular ordering over them anyway, so this is probably not something you need to worry about.

like image 160
Daniel Pryden Avatar answered Sep 24 '22 07:09

Daniel Pryden