Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use a set for list comparisons?

Tags:

python

I just read another users question while looking for a way to compute the differences in two lists.

Python, compute list difference

My question is why would I do

def diff(a,b):
    b = set(b)
    return [aa for aa in a if aa not in b]

rather than doing

def diff(a,b):
    tmp = []
    for i in a:
        if(i not in b):
            tmp.append(i)
return tmp

edit: just noticed the second diff function actually returned the similarities. It should be correct now.

like image 706
Jake Avatar asked May 09 '12 17:05

Jake


People also ask

What is the advantage of set VS list?

Lists are slightly faster than sets when you just want to iterate over the values. Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

Why would you use a set in Python?

Sets are used to store multiple items in a single variable. Set is one of 4 built-in data types in Python used to store collections of data, the other 3 are List, Tuple, and Dictionary, all with different qualities and usage. A set is a collection which is unordered, unchangeable*, and unindexed.

Why are sets faster than lists?

Sets cannot contain duplicates, and they will simply disappear. Sets use hashing to perform look ups which makes them way faster than lists in this regard.


1 Answers

Just from an algorithmic perspective, it takes O(n) to construct the set and O(n) to do the list comprehension (since testing if an element is contained in a set is O(1)). However in the second example, it would take O(n^2) to loop through both lists. So regardless of the programming language, the first approach is superior.

Also, list comprehensions in python are inherently faster than a for loop. This reduces the constant factor even more (and significantly so too). The reason why can be summarized in this post which I quote here:

The fact that list comprehensions can only be comprised of expressions, not statements, is a considerable factor, as much less work is required behind the scenes for each iteration. Another factor is that the underlying iteration mechanism for list comprehensions is much closer to a C loop than execution of a for loop.

like image 84
tskuzzy Avatar answered Sep 22 '22 23:09

tskuzzy