If I have a python list that is has many duplicates, and I want to iterate through each item, but not through the duplicates, is it best to use a set (as in set(mylist)
, or find another way to create a list without duplicates? I was thinking of just looping through the list and checking for duplicates but I figured that's what set()
does when it's initialized.
So if mylist = [3,1,5,2,4,4,1,4,2,5,1,3]
and I really just want to loop through [1,2,3,4,5]
(order doesn't matter), should I use set(mylist)
or something else?
An alternative is possible in the last example, since the list contains every integer between its min and max value, I could loop through range(min(mylist),max(mylist))
or through set(mylist)
. Should I generally try to avoid using set in this case? Also, would finding the min
and max
be slower than just creating the set
?
In the case in the last example, the set
is faster:
from numpy.random import random_integers ids = random_integers(1e3,size=1e6) def set_loop(mylist): idlist = [] for id in set(mylist): idlist.append(id) return idlist def list_loop(mylist): idlist = [] for id in range(min(mylist),max(mylist)): idlist.append(id) return idlist %timeit set_loop(ids) #1 loops, best of 3: 232 ms per loop %timeit list_loop(ids) #1 loops, best of 3: 408 ms per loop
Iterating over a List is much much faster than iterating over a set. The currently accepted answer is using a very small set and list and hence, the difference is negligible there.
Lists are slightly faster than sets when you just want to iterate over the values. Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.
Because sets cannot have multiple occurrences of the same element, it makes sets highly useful to efficiently remove duplicate values from a list or tuple and to perform common math operations like unions and intersections.
Because of differences in how Python implements for loops and list comprehension, list comprehensions are almost always faster than for loops when performing operations.
Just use a set
. Its semantics are exactly what you want: a collection of unique items.
Technically you'll be iterating through the list twice: once to create the set, once for your actual loop. But you'd be doing just as much work or more with any other approach.
set
is what you want, so you should use set
. Trying to be clever introduces subtle bugs like forgetting to add one tomax(mylist)
! Code defensively. Worry about what's faster when you determine that it is too slow.
range(min(mylist), max(mylist) + 1) # <-- don't forget to add 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With