Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest method for checking for duplicates in python?

Using a dictionary seems ideal.

e.g.:

history = {}
for i in collection:
    if i not in history:
        history[i] = None
        # fancy computation here

Would using the set() type be just as fast; set() would not require me to add silly None values to the hash keys.

like image 653
TheOne Avatar asked Dec 27 '22 01:12

TheOne


1 Answers

Yes, you should use a set.


Would using the set() type be just as fast;

No, it won't be just as fast. It will be faster.


Update

Some people have posted benchmarks showing that set is slower than dict. I think this is a bit surprising since they basically have the same underlying implementation except that set is simpler. I think that I have found the reason for the slowness:

def set_way():
    my_set = set()
    my_set_add = my_set.add   # remember the method
    for ele in x:
        if ele not in my_set:
            my_set_add(ele)   # call the method directly

Results:

dict time : 1.896939858077399
set time : 1.8587076107880456

Set is now slightly faster, as expected.

like image 72
Mark Byers Avatar answered Jan 03 '23 11:01

Mark Byers