Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confused why after 2nd evaluation of += operator of immutable string does not change the id in Python3 [duplicate]

I'm using Python 3.8.3 & I got some unexpected output like below when checking id of strings.

>>> a="d"
>>> id(a)
1984988052656
>>> a+="e"
>>> id(a)
1985027888368
>>> a+="h"
>>> id(a)
1985027888368
>>> a+="i"
>>> id(a)
1985027888368
>>> 

After the line which adding "h" to a, id(a) didn't change. How is that possible when strings are immutable ? I got this same output when I use a=a+"h" instead of a+="h" and run this code in a .py file also(I mentioned that because there is some situations we can see different output when running in the shell and running same code after save to a file)

like image 282
Chamod Avatar asked Jun 28 '20 10:06

Chamod


People also ask

Does using the += operator to concatenate strings violate Python's string immutability Why or why not?

It violates the rules of how ID values and += are supposed to work - the ID values produced with the optimization in place would be not only impossible, but prohibited, with the unoptimized semantics - but the developers care more about people who would see bad concatenation performance and assume Python sucks.

Does assigning a value to a string indexed character violate Python string immutability?

The string itself is immutable but the label can change. Assigning a new value to an existing variable is perfectly valid.

Why Python strings are immutable justify?

In Python, strings are made immutable so that programmers cannot alter the contents of the object (even by mistake). This avoids unnecessary bugs. Some other immutable objects are integer, float, tuple, and bool. More on mutable and immutable objects in Python.

Is operator in python on string?

In python, String operators represent the different types of operations that can be employed on the program's string type of variables. Python allows several string operators that can be applied on the python string are as below: Assignment operator: “=.” Concatenate operator: “+.”


2 Answers

This is only possible due to a weird, slightly-sketchy optimization for string concatenation in the bytecode evaluation loop. The INPLACE_ADD implementation special-cases two string objects:

case TARGET(INPLACE_ADD): {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *sum;
    if (PyUnicode_CheckExact(left) && PyUnicode_CheckExact(right)) {
        sum = unicode_concatenate(tstate, left, right, f, next_instr);
        /* unicode_concatenate consumed the ref to left */
    }
    else {
        ...

and calls a unicode_concatenate helper that delegates to PyUnicode_Append, which tries to mutate the original string in-place:

void
PyUnicode_Append(PyObject **p_left, PyObject *right)
{
    ...
    if (unicode_modifiable(left)
        && PyUnicode_CheckExact(right)
        && PyUnicode_KIND(right) <= PyUnicode_KIND(left)
        /* Don't resize for ascii += latin1. Convert ascii to latin1 requires
           to change the structure size, but characters are stored just after
           the structure, and so it requires to move all characters which is
           not so different than duplicating the string. */
        && !(PyUnicode_IS_ASCII(left) && !PyUnicode_IS_ASCII(right)))
    {
        /* append inplace */
        if (unicode_resize(p_left, new_len) != 0)
            goto error;

        /* copy 'right' into the newly allocated area of 'left' */
        _PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
    }
    ...

The optimization only happens if unicode_concatenate can guarantee there are no other references to the LHS. Your initial a="d" had other references, since Python uses a cache of 1-character strings in the Latin-1 range, so the optimization didn't trigger. The optimization can also fail to trigger in a few other cases, such as if the LHS has a cached hash, or if realloc needs to move the string (in which case most of the optimization's code path executes, but it doesn't succeed in performing the operation in-place).


This optimization violates the normal rules for id and +=. Normally, += on immutable objects is supposed to create a new object before clearing the reference to the old object, so the new and old objects should have overlapping lifetimes, forbidding equal id values. With the optimization in place, the string after the += has the same ID as the string before the +=.

The language developers decided they cared more about people who would put string concatenation in a loop, see bad performance, and assume Python sucks, than they cared about this obscure technical point.

like image 51
user2357112 supports Monica Avatar answered Oct 24 '22 14:10

user2357112 supports Monica


Somewhat of a guesswork here - when the GC runs, it's allowed to compact/reorganize the memory. By doing so, it's well within its right to reuse old addresses as long as they are now free. By calling a+="h" you've created a new immutable string, but lost the reference to the string a previously pointed to. This string becomes eligible for garbage collection, meaning the old address it used to occupy can be reused.

like image 44
Mureinik Avatar answered Oct 24 '22 16:10

Mureinik