How is the s=s+c string concat optimization decided?

Question

Short version: If s is a string, then s = s + 'c' might modify the string in place, while t = s + 'c' can't. But how does the operation s + 'c' know which scenario it's in?

Long version:

t = s + 'c' needs to create a separate string because the program afterwards wants both the old string as s and the new string as t.

s = s + 'c' can modify the string in place if s is the only reference, as the program only wants s to be the extended string. CPython actually does this optimization, if there's space at the end for the extra character.

Consider these functions, which repeatedly add a character:

def fast(n):
    s = ''
    for _ in range(n):
        s = s + 'c'
        t = s
        del t

def slow(n):
    s = ''
    for _ in range(n):
        t = s + 'c'
        s = t
        del t

Benchmark results with n = 100_000 (Try it online!):

fast :   9 ms    9 ms    9 ms    9 ms   10 ms 
slow : 924 ms  927 ms  931 ms  933 ms  945 ms

Note that the extra t = s or s = t makes both variables equivalent references to the string and then del t leaves only s, so at the next loop iteration, s is again the only reference to the string. Thus the only difference between the two functions is the order in which s + 'c' is assigned to s and t.

Let's also disassemble the bytecode. I marked the only three differences with != in the middle. As expected, only the variables for STORE_FAST and LOAD_FAST differ. But up to and including the BINARY_ADD, the bytecode is identical. So how does the BINARY_ADD know whether to optimize or not?

      import dis                                 import dis
      dis.dis(fast)                              dis.dis(slow)
---------------------------------------------------------------------------
    0 LOAD_CONST     1 ('')                    0 LOAD_CONST     1 ('')
    2 STORE_FAST     1 (s)                     2 STORE_FAST     1 (s)
                                                                 
    4 LOAD_GLOBAL    0 (range)                 4 LOAD_GLOBAL    0 (range)
    6 LOAD_FAST      0 (n)                     6 LOAD_FAST      0 (n)
    8 CALL_FUNCTION  1                         8 CALL_FUNCTION  1
   10 GET_ITER                                10 GET_ITER      
>> 12 FOR_ITER      18 (to 32)             >> 12 FOR_ITER      18 (to 32)
   14 STORE_FAST     2 (_)                    14 STORE_FAST     2 (_)
                                                               
   16 LOAD_FAST      1 (s)                    16 LOAD_FAST      1 (s)
   18 LOAD_CONST     2 ('c')                  18 LOAD_CONST     2 ('c')
   20 BINARY_ADD                              20 BINARY_ADD    
   22 STORE_FAST     1 (s)        !=          22 STORE_FAST     3 (t)
                                                               
   24 LOAD_FAST      1 (s)        !=          24 LOAD_FAST      3 (t)
   26 STORE_FAST     3 (t)        !=          26 STORE_FAST     1 (s)
                                                               
   28 DELETE_FAST    3 (t)                    28 DELETE_FAST    3 (t)
   30 JUMP_ABSOLUTE 12                        30 JUMP_ABSOLUTE 12
>> 32 LOAD_CONST     0 (None)              >> 32 LOAD_CONST     0 (None)
   34 RETURN_VALUE                            34 RETURN_VALUE

Tim Peters · Accepted Answer

Here's the code in question, from the Python 3.10 branch (in ceval.c, and called from the same file's implementation of the BINARY_ADD opcode). As @jasonharper noted in a comment, it peeks ahead to see whether the result of the BINARY_ADD will next be bound to the same name from which the left-hand addend came. In fast(), it is (operand came from s and result stored into s), but in slow() it isn't (operand came from s but stored into t).

There's no guarantee this optimization will persist, though. For example, I noticed that your fast() is no faster than your slow() on the current development CPython main branch (which is the current work-in-progress toward an eventual 3.11 release).

Should people rely on this?

As noted, there's no guarantee this optimization will persist. "Serious" Python programmers should know better than to rely on dodgy CPython-specific tricks, and, indeed, PEP 8 explicitly warns against relying on this specific one:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b ...

How is the s=s+c string concat optimization decided?

Tags:

performance

python

cpython

internals

no comment

Video Answer

1 Answers

Should people rely on this?

Tim Peters

Recent Activity

Donate For Us

How is the s=s+c string concat optimization decided?

Tags:

performance

python

cpython

internals

no comment

Video Answer

1 Answers

Should people rely on this?

Tim Peters

Related questions

Recent Activity

Donate For Us