Short version: If s is a string, then s = s + 'c' might modify the string in place, while t = s + 'c' can't. But how does the operation s + 'c' know which scenario it's in?
Long version:
t = s + 'c' needs to create a separate string because the program afterwards wants both the old string as s and the new string as t.
s = s + 'c' can modify the string in place if s is the only reference, as the program only wants s to be the extended string. CPython actually does this optimization, if there's space at the end for the extra character.
Consider these functions, which repeatedly add a character:
def fast(n):
s = ''
for _ in range(n):
s = s + 'c'
t = s
del t
def slow(n):
s = ''
for _ in range(n):
t = s + 'c'
s = t
del t
Benchmark results with n = 100_000 (Try it online!):
fast : 9 ms 9 ms 9 ms 9 ms 10 ms
slow : 924 ms 927 ms 931 ms 933 ms 945 ms
Note that the extra t = s or s = t makes both variables equivalent references to the string and then del t leaves only s, so at the next loop iteration, s is again the only reference to the string. Thus the only difference between the two functions is the order in which s + 'c' is assigned to s and t.
Let's also disassemble the bytecode. I marked the only three differences with != in the middle. As expected, only the variables for STORE_FAST and LOAD_FAST differ. But up to and including the BINARY_ADD, the bytecode is identical. So how does the BINARY_ADD know whether to optimize or not?
import dis import dis
dis.dis(fast) dis.dis(slow)
---------------------------------------------------------------------------
0 LOAD_CONST 1 ('') 0 LOAD_CONST 1 ('')
2 STORE_FAST 1 (s) 2 STORE_FAST 1 (s)
4 LOAD_GLOBAL 0 (range) 4 LOAD_GLOBAL 0 (range)
6 LOAD_FAST 0 (n) 6 LOAD_FAST 0 (n)
8 CALL_FUNCTION 1 8 CALL_FUNCTION 1
10 GET_ITER 10 GET_ITER
>> 12 FOR_ITER 18 (to 32) >> 12 FOR_ITER 18 (to 32)
14 STORE_FAST 2 (_) 14 STORE_FAST 2 (_)
16 LOAD_FAST 1 (s) 16 LOAD_FAST 1 (s)
18 LOAD_CONST 2 ('c') 18 LOAD_CONST 2 ('c')
20 BINARY_ADD 20 BINARY_ADD
22 STORE_FAST 1 (s) != 22 STORE_FAST 3 (t)
24 LOAD_FAST 1 (s) != 24 LOAD_FAST 3 (t)
26 STORE_FAST 3 (t) != 26 STORE_FAST 1 (s)
28 DELETE_FAST 3 (t) 28 DELETE_FAST 3 (t)
30 JUMP_ABSOLUTE 12 30 JUMP_ABSOLUTE 12
>> 32 LOAD_CONST 0 (None) >> 32 LOAD_CONST 0 (None)
34 RETURN_VALUE 34 RETURN_VALUE
Here's the code in question, from the Python 3.10 branch (in ceval.c, and called from the same file's implementation of the BINARY_ADD opcode). As @jasonharper noted in a comment, it peeks ahead to see whether the result of the BINARY_ADD will next be bound to the same name from which the left-hand addend came. In fast(), it is (operand came from s and result stored into s), but in slow() it isn't (operand came from s but stored into t).
There's no guarantee this optimization will persist, though. For example, I noticed that your fast() is no faster than your slow() on the current development CPython main branch (which is the current work-in-progress toward an eventual 3.11 release).
As noted, there's no guarantee this optimization will persist. "Serious" Python programmers should know better than to rely on dodgy CPython-specific tricks, and, indeed, PEP 8 explicitly warns against relying on this specific one:
Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).
For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form
a += bora = a + b...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With