Say, I wish to compute the difference of two lists <code>C = A - B</code>: <pre class="prettyprint"><code>A = [1,2,3,4,5,6,7,8,9] B = [1,3,5,8,9] C = [2,4,6,7] #Result </code></pre> <code>A</code> and <code>B</code> are both sorted with unique integers (not sure if there is a way to tell Python about this property of the list). I need to preserve the order of the elements. AFAIK there are two possible ways of doing it Method 1: Convert B into a set and use list comprehension to generate C: <pre class="prettyprint"><code>s = set(B) C = [x for x in A if x not in s] </code></pre> Method 2: Directly use list comprehension: <pre class="prettyprint"><code>C = [x for x in A if x not in B] </code></pre> Why is <code>#1</code> more efficient than <code>#2</code>? Isn't there an overhead to convert to a set? What am I missing here? Some performance benchmarks are given in this answer. UPDATE: I'm aware that a set's average <code>O(1)</code> lookup time beats that of a list's <code>O(n)</code> but if the original list <code>A</code> contains about a million or so integers, wouldn't the set creation actually take longer?

Average time complexity for lookup (x in S) in a set is O(1) while the same for a list is O(n). You can check the details at https://wiki.python.org/moin/TimeComplexity

Why is converting a list to a set faster than using just list to compute a list difference?

Tags:

python-2.7

Say, I wish to compute the difference of two lists C = A - B:

A = [1,2,3,4,5,6,7,8,9] 
B = [1,3,5,8,9]
C = [2,4,6,7]          #Result

A and B are both sorted with unique integers (not sure if there is a way to tell Python about this property of the list). I need to preserve the order of the elements. AFAIK there are two possible ways of doing it

Method 1: Convert B into a set and use list comprehension to generate C:

s = set(B)
C = [x for x in A if x not in s]

Method 2: Directly use list comprehension:

C = [x for x in A if x not in B]

Why is #1 more efficient than #2? Isn't there an overhead to convert to a set? What am I missing here?

Some performance benchmarks are given in this answer.

UPDATE: I'm aware that a set's average O(1) lookup time beats that of a list's O(n) but if the original list A contains about a million or so integers, wouldn't the set creation actually take longer?

800

asked Aug 13 '14 19:08

PhD

2 Answers

There is overhead to convert a list to a set, but a set is substantially faster than a list for those in tests.

You can instantly see if item x is in set y because there's a hash table being used underneath. No matter how large your set is, the lookup time is the same (basically instantaneous) - this is known in Big-O notation as O(1). For a list, you have to individually check every element to see if item x is in list z. As your list grows, the check will take longer - this is O(n), meaning the length of the operation is directly tied to how long the list is.

That increased speed can offset the set creation overhead, which is how your set check ends up being faster.

EDIT: to answer that other question, Python has no way of determining that your list is sorted - not if you're using a standard list object, anyway. So it can't achieve O(log n) performance with a list comprehension. If you wanted to write your own binary search method which assumes the list is sorted, you can certainly do so, but O(1) beats O(log n) any day.

EDIT 2:

I'm aware that a set's average O(1) lookup time beats that of a list's O(n) but if the original list A contains about a million or so integers, wouldn't the set creation actually take longer?

No, not at all. Creating a set out of a list is a O(n) operation, as inserting an item into a set is O(1) and you're doing that n times. If you have a list with a million integers in it, converting it into a set involves two O(n) steps, while repeatedly scanning the list is going to be n O(n) steps. In practice, creating the set is going to be about 250,000 times faster for a list with a million integers, and the speed difference will grow larger and larger the more items you have in your list.

155

answered Oct 04 '22 16:10

TheSoundDefense

Average time complexity for lookup (x in S) in a set is O(1) while the same for a list is O(n).

You can check the details at https://wiki.python.org/moin/TimeComplexity

answered Oct 04 '22 18:10

Pankaj Sharma

Related questions
                            
                                Why urllib.urlopen.read() does not correspond to source code?
                            
                                Why does a chained dictionary .get() in python return a tuple when the default provided is not a tuple?
                            
                                How to extract numbers from filename in Python?
                            
                                Keeping only certain characters in a string using Python?
                            
                                Convert a number enclosed in parentheses (string) to a negative integer (or float) using Python?
                            
                                bufsize must be an integer error while grepping a message
                            
                                how to disable the window maximize icon using PyQt4?
                            
                                How do I convert dates into ISO-8601 DateTime format in a Pandas dataframe
                            
                                overloading unittest.testcase in python
                            
                                Find the longest substring in alphabetical order
                            
                                How to parse json with ijson and python
                            
                                What is the time complexity of sum() in Python? [closed]
                            
                                python convert a string to arguments list
                            
                                How to install delete-project plugin in gerrit?
                            
                                How to catch exceptions in workers in Multiprocessing
                            
                                piping from stdin to a python code in a bash script
                            
                                clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]
                            
                                How to output RandomForest Classifier from python?
                            
                                How to avoid for-in looping over None in Python
                            
                                HTTP 403 Responses when using Python Scrapy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is converting a list to a set faster than using just list to compute a list difference?

Tags:

performance

python

list

set

python-2.7

PhD

People also ask

2 Answers

TheSoundDefense

Pankaj Sharma

Recent Activity

Donate For Us