Short version: If s
is a string, then s = s + 'c'
might modify the string in place, while t = s + 'c'
can't. But how does the operation s + 'c'
know which scenario it's in?
Long version:
t = s + 'c'
needs to create a separate string because the program afterwards wants both the old string as s
and the new string as t
.
s = s + 'c'
can modify the string in place if s
is the only reference, as the program only wants s
to be the extended string. CPython actually does this optimization, if there's space at the end for the extra character.
Consider these functions, which repeatedly add a character:
def fast(n):
s = ''
for _ in range(n):
s = s + 'c'
t = s
del t
def slow(n):
s = ''
for _ in range(n):
t = s + 'c'
s = t
del t
Benchmark results with n = 100_000
(Try it online!):
fast : 9 ms 9 ms 9 ms 9 ms 10 ms
slow : 924 ms 927 ms 931 ms 933 ms 945 ms
Note that the extra t = s
or s = t
makes both variables equivalent references to the string and then del t
leaves only s
, so at the next loop iteration, s
is again the only reference to the string. Thus the only difference between the two functions is the order in which s + 'c'
is assigned to s
and t
.
Let's also disassemble the bytecode. I marked the only three differences with !=
in the middle. As expected, only the variables for STORE_FAST
and LOAD_FAST
differ. But up to and including the BINARY_ADD
, the bytecode is identical. So how does the BINARY_ADD
know whether to optimize or not?
import dis import dis
dis.dis(fast) dis.dis(slow)
---------------------------------------------------------------------------
0 LOAD_CONST 1 ('') 0 LOAD_CONST 1 ('')
2 STORE_FAST 1 (s) 2 STORE_FAST 1 (s)
4 LOAD_GLOBAL 0 (range) 4 LOAD_GLOBAL 0 (range)
6 LOAD_FAST 0 (n) 6 LOAD_FAST 0 (n)
8 CALL_FUNCTION 1 8 CALL_FUNCTION 1
10 GET_ITER 10 GET_ITER
>> 12 FOR_ITER 18 (to 32) >> 12 FOR_ITER 18 (to 32)
14 STORE_FAST 2 (_) 14 STORE_FAST 2 (_)
16 LOAD_FAST 1 (s) 16 LOAD_FAST 1 (s)
18 LOAD_CONST 2 ('c') 18 LOAD_CONST 2 ('c')
20 BINARY_ADD 20 BINARY_ADD
22 STORE_FAST 1 (s) != 22 STORE_FAST 3 (t)
24 LOAD_FAST 1 (s) != 24 LOAD_FAST 3 (t)
26 STORE_FAST 3 (t) != 26 STORE_FAST 1 (s)
28 DELETE_FAST 3 (t) 28 DELETE_FAST 3 (t)
30 JUMP_ABSOLUTE 12 30 JUMP_ABSOLUTE 12
>> 32 LOAD_CONST 0 (None) >> 32 LOAD_CONST 0 (None)
34 RETURN_VALUE 34 RETURN_VALUE
Here's the code in question, from the Python 3.10 branch (in ceval.c
, and called from the same file's implementation of the BINARY_ADD
opcode). As @jasonharper noted in a comment, it peeks ahead to see whether the result of the BINARY_ADD
will next be bound to the same name from which the left-hand addend came. In fast()
, it is (operand came from s
and result stored into s
), but in slow()
it isn't (operand came from s
but stored into t
).
There's no guarantee this optimization will persist, though. For example, I noticed that your fast()
is no faster than your slow()
on the current development CPython main
branch (which is the current work-in-progress toward an eventual 3.11 release).
As noted, there's no guarantee this optimization will persist. "Serious" Python programmers should know better than to rely on dodgy CPython-specific tricks, and, indeed, PEP 8 explicitly warns against relying on this specific one:
Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).
For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form
a += b
ora = a + b
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With