Which approach is better? Using a tuple, like: <pre class="prettyprint"><code>if number in (1, 2): </code></pre> or a list, like: <pre class="prettyprint"><code>if number in [1, 2]: </code></pre> Which one is recommended for such uses and why (both logical and performance wise)?

The CPython interpreter replaces the second form with the first. That's because loading the tuple from a constant is one operation, but the list would be 3 operations; load the two integer contents and build a new list object. Because you are using a list literal that isn't otherwise reachable, it is substituted for a tuple: <pre class="prettyprint"><code>>>> import dis >>> dis.dis(compile('number in [1, 2]', '<stdin>', 'eval')) 1 0 LOAD_NAME 0 (number) 3 LOAD_CONST 2 ((1, 2)) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE </code></pre> Here the second bytecode loads a <code>(1, 2)</code> tuple as a constant, in one step. Compare this to creating a list object not used in a membership test: <pre class="prettyprint"><code>>>> dis.dis(compile('[1, 2]', '<stdin>', 'eval')) 1 0 LOAD_CONST 0 (1) 3 LOAD_CONST 1 (2) 6 BUILD_LIST 2 9 RETURN_VALUE </code></pre> Here N+1 steps are required for a list object of length N. This substitution is a CPython-specific peephole optimisation; see the <code>Python/peephole.c</code> source. For other Python implementations then, you want to stick with immutable objects instead. That said, the best option when using Python 3.2 and up, is to use a set literal: <pre class="prettyprint"><code>if number in {1, 2}: </code></pre> as the peephole optimiser will replace that with a <code>frozenset()</code> object and membership tests against sets are O(1) constant operations: <pre class="prettyprint"><code>>>> dis.dis(compile('number in {1, 2}', '<stdin>', 'eval')) 1 0 LOAD_NAME 0 (number) 3 LOAD_CONST 2 (frozenset({1, 2})) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE </code></pre> This optimization was added in Python 3.2 but wasn't backported to Python 2. As such, the Python 2 optimiser doesn't recognize this option and the cost of building either a <code>set</code> or <code>frozenset</code> from the contents is almost guaranteed to be more costly than using a tuple for the test. Set membership tests are O(1) and fast; testing against a tuple is O(n) worst case. Although testing against a set has to calculate the hash (higher constant cost, cached for immutable types), the cost for testing against a tuple other than the first element is always going to be higher. So on average, sets are easily faster: <pre class="prettyprint"><code>>>> import timeit >>> timeit.timeit('1 in (1, 3, 5)', number=10**7) # best-case for tuples 0.21154764899984002 >>> timeit.timeit('8 in (1, 3, 5)', number=10**7) # worst-case for tuples 0.5670104179880582 >>> timeit.timeit('1 in {1, 3, 5}', number=10**7) # average-case for sets 0.2663505630043801 >>> timeit.timeit('8 in {1, 3, 5}', number=10**7) # worst-case for sets 0.25939063701662235 </code></pre>

Tuple or list when using 'in' in an 'if' clause?

Tags:

python

list

optimization

python-internals

tuples

Which approach is better? Using a tuple, like:

if number in (1, 2):

or a list, like:

if number in [1, 2]:

Which one is recommended for such uses and why (both logical and performance wise)?

659

asked Aug 18 '14 16:08

linkyndy

1 Answers

The CPython interpreter replaces the second form with the first.

That's because loading the tuple from a constant is one operation, but the list would be 3 operations; load the two integer contents and build a new list object.

Because you are using a list literal that isn't otherwise reachable, it is substituted for a tuple:

>>> import dis >>> dis.dis(compile('number in [1, 2]', '<stdin>', 'eval'))   1           0 LOAD_NAME                0 (number)               3 LOAD_CONST               2 ((1, 2))               6 COMPARE_OP               6 (in)               9 RETURN_VALUE

Here the second bytecode loads a (1, 2) tuple as a constant, in one step. Compare this to creating a list object not used in a membership test:

>>> dis.dis(compile('[1, 2]', '<stdin>', 'eval'))   1           0 LOAD_CONST               0 (1)               3 LOAD_CONST               1 (2)               6 BUILD_LIST               2               9 RETURN_VALUE

Here N+1 steps are required for a list object of length N.

This substitution is a CPython-specific peephole optimisation; see the Python/peephole.c source. For other Python implementations then, you want to stick with immutable objects instead.

That said, the best option when using Python 3.2 and up, is to use a set literal:

if number in {1, 2}:

as the peephole optimiser will replace that with a frozenset() object and membership tests against sets are O(1) constant operations:

>>> dis.dis(compile('number in {1, 2}', '<stdin>', 'eval'))   1           0 LOAD_NAME                0 (number)               3 LOAD_CONST               2 (frozenset({1, 2}))               6 COMPARE_OP               6 (in)               9 RETURN_VALUE

This optimization was added in Python 3.2 but wasn't backported to Python 2.

As such, the Python 2 optimiser doesn't recognize this option and the cost of building either a set or frozenset from the contents is almost guaranteed to be more costly than using a tuple for the test.

Set membership tests are O(1) and fast; testing against a tuple is O(n) worst case. Although testing against a set has to calculate the hash (higher constant cost, cached for immutable types), the cost for testing against a tuple other than the first element is always going to be higher. So on average, sets are easily faster:

>>> import timeit >>> timeit.timeit('1 in (1, 3, 5)', number=10**7)  # best-case for tuples 0.21154764899984002 >>> timeit.timeit('8 in (1, 3, 5)', number=10**7)  # worst-case for tuples 0.5670104179880582 >>> timeit.timeit('1 in {1, 3, 5}', number=10**7)  # average-case for sets 0.2663505630043801 >>> timeit.timeit('8 in {1, 3, 5}', number=10**7)  # worst-case for sets 0.25939063701662235

answered Oct 02 '22 19:10

Martijn Pieters

Related questions
                            
                                Turtle module - Saving an image
                            
                                In Python argparse, is it possible to have paired --no-something/--something arguments?
                            
                                Why does right-clicking create an orange dot in the center of the circle?
                            
                                Celery - How to send task from remote machine?
                            
                                Django populate() isn't reentrant
                            
                                Installing iPython: "ImportError cannot import name path"?
                            
                                How To Plot Multiple Histograms On Same Plot With Seaborn
                            
                                "System error: new style getargs format but argument is not a tuple" when using cv2.blur
                            
                                Numpy: change max in each row to 1, all other numbers to 0
                            
                                pandas join DataFrame force suffix?
                            
                                Profiling a python program with PyCharm (or any other IDE)
                            
                                Split speech audio file on words in python
                            
                                Using %matplotlib notebook after %matplotlib inline in Jupyter Notebook doesn't work
                            
                                I cannot install Tensorflow Version 1.15 through pip
                            
                                How to efficiently use MySQLDB SScursor?
                            
                                Fastest way to generate delimited string from 1d numpy array
                            
                                Emitting namespace specifications with ElementTree in Python
                            
                                Writing response body with BaseHTTPRequestHandler
                            
                                Python Multiple users append to the same file at the same time
                            
                                Python string 'in' operator implementation algorithm and time complexity

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With