Why are chained operator expressions slower than their expanded equivalent?

Tags:

In python, it is possible to chain operators in this manner:

a op b op c

Which is evaluated to

a op b and b op c

With the only difference being that b is evaluated only once (so, something more like t = eval(b); a op t and t op c).

This is advantageous from the view point that it is very readable and more concise than the equivalent version with explicit conjunction (using and).

However... I've noticed that there is a minor performance difference between chained expressions and the equivalent, be it for 3 operands or 20. This becomes apparent when you time these operations.

import timeit 

timeit.timeit("a <= b <= c", setup="a,b,c=1,2,3")
0.1086414959972899

timeit.timeit("a <= b and b <= c", setup="a,b,c=1,2,3")
0.09434155100097996

And,

timeit.timeit("a <= b <= c <= d <= e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.2151330839988077

timeit.timeit("a <= b and b <= c and c <= d and d <= e and e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.19196406500122976

Note: All tests were done with Python-3.4.

Examining the byte code for both expressions, I noticed that one performs significantly more (actually, 4 more) operations than the other.

import dis

dis.dis("a <= b <= c")
  1           0 LOAD_NAME                0 (a)
              3 LOAD_NAME                1 (b)
              6 DUP_TOP
              7 ROT_THREE
              8 COMPARE_OP               1 (<=)
             11 JUMP_IF_FALSE_OR_POP    21
             14 LOAD_NAME                2 (c)
             17 COMPARE_OP               1 (<=)
             20 RETURN_VALUE
        >>   21 ROT_TWO
             22 POP_TOP
             23 RETURN_VALUE

Contrast this with,

dis.dis("a <= b and b <= c")
  1           0 LOAD_NAME                0 (a)
              3 LOAD_NAME                1 (b)
              6 COMPARE_OP               1 (<=)
              9 JUMP_IF_FALSE_OR_POP    21
             12 LOAD_NAME                1 (b)
             15 LOAD_NAME                2 (c)
             18 COMPARE_OP               1 (<=)
        >>   21 RETURN_VALUE

I am not experienced with reading byte code, but the first code snippet definitely performs more operations at the byte code level than the second.

Here's how I've interpreted this. In the first case, variables are pushed onto some sort of stack, and popped successively for comparison. All variables are popped only once. In the second case, there is no stack, but at least (N - 2) of the operands have to be loaded into memory twice for comparison. It appears the stack popping operation is more expensive than loading (N - 2) variables twice for comparison, accounting for the speed difference.

In a nutshell, I'm trying to understand why one operation is always slower than the other by a constant factor. Is my hypothesis correct? Or is there something more to the python internals I'm missing?

More benchmarks:

| System | a <= b <= c         | a <= b and b <= c   | a <= b <= ... <= e <= f | a <= b and ... and e <= f | Credit         |
|--------|---------------------|---------------------|-------------------------|---------------------------|----------------|
| 3.4    | 0.1086414959972899  | 0.09434155100097996 | 0.2151330839988077      | 0.19196406500122976       | @cᴏʟᴅsᴘᴇᴇᴅ     |
| 3.6.2  | 0.06788300536572933 | 0.059271858073771   | 0.1505890181288123      | 0.12044331897050142       | @Bailey Parker |
| 2.7.10 | 0.05009198188781738 | 0.04472208023071289 | 0.11113405227661133     | 0.09062719345092773       | @Bailey Parker |

489

asked Jan 22 '18 06:01

cs95

1 Answers

In CPython's stack-based bytecode execution engine, saving an extra reference to b for the chained comparison isn't free. It's at the "seriously, don't worry about it" level of cheap, but it's not literally free, and you're comparing it to the slightly cheaper operation of loading a local variable.

The COMPARE_OP opcode removes the objects it's comparing from the stack, so for the chained comparison, Python has to create another reference to b (DUP_TOP) and shove it two places down in the stack (ROT_THREE) to get it out of the way.

In a <= b and b <= c, instead of the above reference shuffling, Python just copies another reference to b out of the stack frame's fastlocals array. This involves less pointer shuffling and one less trip around the bytecode evaluation loop, so it's slightly cheaper.

answered Oct 15 '22 22:10

user2357112 supports Monica

Related questions
                            
                                How to check if a polygon is empty in Shapely?
                            
                                Removing every nth element in an array
                            
                                In requests library, how can I avoid "HttpConnectionPool is full, discarding connection" warning?
                            
                                Python any() function within a list comprehension
                            
                                How to use bower package manager in Django App?
                            
                                How to get lng lat value from query results of geoalchemy2
                            
                                How to specify another tox project folder as a dependency for a tox project
                            
                                Compute SHA1 of Strings in python
                            
                                How to setup different subdomains in Flask (using blueprints)?
                            
                                How can I change device used of theano
                            
                                Why is globals() a function in Python?
                            
                                What is the difference between Session and db.session in SQLAlchemy?
                            
                                Sklearn: Is there any way to debug Pipelines?
                            
                                In scipy's ConvexHull, what does "area" measure?
                            
                                Connection refused with postgresql using psycopg2
                            
                                Provide __classcell__ example for Python 3.6 metaclass
                            
                                Unsuccessful TensorSliceReader constructor: Failed to find any matching files for bird-classifier.tfl.ckpt-50912
                            
                                Angular 4 frontend with python flask backend how to render simple index page
                            
                                Mock a method of a mocked object in Python?
                            
                                Is there a single line way to run a command in a Python venv?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are chained operator expressions slower than their expanded equivalent?

Tags:

python

comparison

python-internals

cs95

People also ask

1 Answers

user2357112 supports Monica

Recent Activity

Donate For Us