Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better/Faster to Loop through set or list?

If I have a python list that is has many duplicates, and I want to iterate through each item, but not through the duplicates, is it best to use a set (as in set(mylist), or find another way to create a list without duplicates? I was thinking of just looping through the list and checking for duplicates but I figured that's what set() does when it's initialized.

So if mylist = [3,1,5,2,4,4,1,4,2,5,1,3] and I really just want to loop through [1,2,3,4,5] (order doesn't matter), should I use set(mylist) or something else?

An alternative is possible in the last example, since the list contains every integer between its min and max value, I could loop through range(min(mylist),max(mylist)) or through set(mylist). Should I generally try to avoid using set in this case? Also, would finding the min and max be slower than just creating the set?


In the case in the last example, the set is faster:

from numpy.random import random_integers ids = random_integers(1e3,size=1e6)  def set_loop(mylist):     idlist = []     for id in set(mylist):         idlist.append(id)     return idlist  def list_loop(mylist):     idlist = []     for id in range(min(mylist),max(mylist)):         idlist.append(id)     return idlist  %timeit set_loop(ids) #1 loops, best of 3: 232 ms per loop  %timeit list_loop(ids) #1 loops, best of 3: 408 ms per loop 
like image 425
askewchan Avatar asked Feb 27 '13 00:02

askewchan


People also ask

Is it faster to iterate through list or set?

Iterating over a List is much much faster than iterating over a set. The currently accepted answer is using a very small set and list and hence, the difference is negligible there.

Is iterating through a set faster than a list Python?

Lists are slightly faster than sets when you just want to iterate over the values. Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

Are sets more efficient than lists?

Because sets cannot have multiple occurrences of the same element, it makes sets highly useful to efficiently remove duplicate values from a list or tuple and to perform common math operations like unions and intersections.

Are list comprehensions faster than for loops?

Because of differences in how Python implements for loops and list comprehension, list comprehensions are almost always faster than for loops when performing operations.


2 Answers

Just use a set. Its semantics are exactly what you want: a collection of unique items.

Technically you'll be iterating through the list twice: once to create the set, once for your actual loop. But you'd be doing just as much work or more with any other approach.

like image 194
Eevee Avatar answered Sep 29 '22 12:09

Eevee


set is what you want, so you should use set. Trying to be clever introduces subtle bugs like forgetting to add one tomax(mylist)! Code defensively. Worry about what's faster when you determine that it is too slow.

range(min(mylist), max(mylist) + 1)  # <-- don't forget to add 1 
like image 43
John La Rooy Avatar answered Sep 29 '22 13:09

John La Rooy