CPython string addition optimisation failure case

Question

The Question

Why, in CPython, does

def add_string(n):
    s = ''
    for _ in range(n):
        s += ' '

take linear time, but

def add_string_in_list(n):
    l = ['']
    for _ in range(n):
        l[0] += ' '

take quadratic time?

Proof:

Timer(partial(add_string, 1000000)).timeit(1)
#>>> 0.1848409200028982
Timer(partial(add_string, 10000000)).timeit(1)
#>>> 1.1123797750042286

Timer(partial(add_string_in_list, 10000)).timeit(1)
#>>> 0.0033865350123960525
Timer(partial(add_string_in_list, 100000)).timeit(1)
#>>> 0.25131178900483064

What I know

CPython has an optimisation for string addition when the string being added to has a reference count of 1.

This is because strings in Python are immutable, and so normally they cannot be edited. If multiple references exist to a string and it is mutated, both references will see the changed string. This is obviously not wanted, so mutation cannot happen with multiple references.

If there is only one reference to the string, however, mutating the value will only change the string for that one reference, which wants it changed. You can test that this is a likely cause as so:

from timeit import Timer
from functools import partial

def add_string_two_references(n):
    s = ''
    for _ in range(n):
        s2 = s
        s += ' '

Timer(partial(add_string_two_references, 20000)).timeit(1)
#>>> 0.032532954995986074
Timer(partial(add_string_two_references, 200000)).timeit(1)
#>>> 1.0898985149979126

I'm unsure why the factor is only 30x, instead of the expected 100x, but I believe that it's overhead.

What I don't know

So why is the list version creating two references? Is this even what's preventing the optimisation?

You can check that it's not treating normal objects any differently:

class Counter:
    def __iadd__(self, other):
        print(sys.getrefcount(self))

s = Counter()
s += None
#>>> 6

class Counter:
    def __iadd__(self, other):
        print(sys.getrefcount(self))

l = [Counter()]
l[0] += None
#>>> 6

ElmoVanKielmo · Accepted Answer

In the list based approach, string from index 0 of the list is taken and modified before being put back to the list at index 0.
For this short moment interpreter still has the old version of string in the list and can't perform in place modification.
If you take a look at Python's source then you'll see that there is no support for modifying the element of the list in place. So the object (string in this case) has to be retrieved from the list, modified and then put back.
In other words list type is completely agnostic of the str type support for += operator.

And consider the following code:

l = ['abc', 'def']
def nasty():
    global l
    l[0] = 'ghi'
    l[1] = 'jkl'
    return 'mno'
l[0] += nasty()

The value of l is ['abcmno', 'jkl'] which proves that 'abc' was taken from the list, then nasty() got executed modifying the contents of the list, strings 'abc' and 'mno' got concatenated and result was assigned to l[0]. If nasty() was evaluated before accessing l[0] to modify it in place, then the result would be 'ghimno'.

Fred Foo · Answer

So why is the list version creating two references?

In l[0] += ' ', one reference is in l[0]. One reference is created temporarily to perform the += on.

Here are two simpler functions to show the effect:

>>> def f():
...     l = ['']
...     l[0] += ' '
...     
>>> def g():
...     s = ''
...     s += ' '
...

Disassembling them gives

>>> from dis import dis
>>> dis(f)
  2           0 LOAD_CONST               1 ('')
              3 BUILD_LIST               1
              6 STORE_FAST               0 (l)

  3           9 LOAD_FAST                0 (l)
             12 LOAD_CONST               2 (0)
             15 DUP_TOPX                 2
             18 BINARY_SUBSCR       
             19 LOAD_CONST               3 (' ')
             22 INPLACE_ADD         
             23 ROT_THREE           
             24 STORE_SUBSCR        
             25 LOAD_CONST               0 (None)
             28 RETURN_VALUE        
>>> dis(g)
  2           0 LOAD_CONST               1 ('')
              3 STORE_FAST               0 (s)

  3           6 LOAD_FAST                0 (s)
              9 LOAD_CONST               2 (' ')
             12 INPLACE_ADD         
             13 STORE_FAST               0 (s)
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE

In f, the BINARY_SUBSCR (slicing) instruction puts l[0] at the top of the VM stack. DUP_TOPX duplicates the top n items on the stack. Both functions (see ceval.c) increase the reference count; DUP_TOPX (DUP_TOP_TWO in Py3) does it directly, while BINARY_SUBSCR uses PyObject_GetItem. So, the reference count of the string is now at least three.

g doesn't have this problem. It creates one additional reference when the item is pushed with LOAD_FAST, giving a refcount of two, the minimal number for an item on the VM stack, so it can do the optimization.

CPython string addition optimisation failure case

Tags:

python

string

optimization

cpython

The Question

What I know

What I don't know

Veedrac

2 Answers

ElmoVanKielmo

Fred Foo

Recent Activity

Donate For Us

CPython string addition optimisation failure case

Tags:

python

string

optimization

cpython

The Question

What I know

What I don't know

Veedrac

2 Answers

ElmoVanKielmo

Fred Foo

Related questions

Recent Activity

Donate For Us