I'm using Python 3.8.3 & I got some unexpected output like below when checking id of strings.
>>> a="d"
>>> id(a)
1984988052656
>>> a+="e"
>>> id(a)
1985027888368
>>> a+="h"
>>> id(a)
1985027888368
>>> a+="i"
>>> id(a)
1985027888368
>>>
After the line which adding "h" to a, id(a) didn't change. How is that possible when strings are immutable ? I got this same output when I use a=a+"h" instead of a+="h" and run this code in a .py file also(I mentioned that because there is some situations we can see different output when running in the shell and running same code after save to a file)
It violates the rules of how ID values and += are supposed to work - the ID values produced with the optimization in place would be not only impossible, but prohibited, with the unoptimized semantics - but the developers care more about people who would see bad concatenation performance and assume Python sucks.
The string itself is immutable but the label can change. Assigning a new value to an existing variable is perfectly valid.
In Python, strings are made immutable so that programmers cannot alter the contents of the object (even by mistake). This avoids unnecessary bugs. Some other immutable objects are integer, float, tuple, and bool. More on mutable and immutable objects in Python.
In python, String operators represent the different types of operations that can be employed on the program's string type of variables. Python allows several string operators that can be applied on the python string are as below: Assignment operator: “=.” Concatenate operator: “+.”
This is only possible due to a weird, slightly-sketchy optimization for string concatenation in the bytecode evaluation loop. The INPLACE_ADD
implementation special-cases two string objects:
case TARGET(INPLACE_ADD): {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *sum;
if (PyUnicode_CheckExact(left) && PyUnicode_CheckExact(right)) {
sum = unicode_concatenate(tstate, left, right, f, next_instr);
/* unicode_concatenate consumed the ref to left */
}
else {
...
and calls a unicode_concatenate
helper that delegates to PyUnicode_Append
, which tries to mutate the original string in-place:
void
PyUnicode_Append(PyObject **p_left, PyObject *right)
{
...
if (unicode_modifiable(left)
&& PyUnicode_CheckExact(right)
&& PyUnicode_KIND(right) <= PyUnicode_KIND(left)
/* Don't resize for ascii += latin1. Convert ascii to latin1 requires
to change the structure size, but characters are stored just after
the structure, and so it requires to move all characters which is
not so different than duplicating the string. */
&& !(PyUnicode_IS_ASCII(left) && !PyUnicode_IS_ASCII(right)))
{
/* append inplace */
if (unicode_resize(p_left, new_len) != 0)
goto error;
/* copy 'right' into the newly allocated area of 'left' */
_PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
}
...
The optimization only happens if unicode_concatenate
can guarantee there are no other references to the LHS. Your initial a="d"
had other references, since Python uses a cache of 1-character strings in the Latin-1 range, so the optimization didn't trigger. The optimization can also fail to trigger in a few other cases, such as if the LHS has a cached hash, or if realloc
needs to move the string (in which case most of the optimization's code path executes, but it doesn't succeed in performing the operation in-place).
This optimization violates the normal rules for id
and +=
. Normally, +=
on immutable objects is supposed to create a new object before clearing the reference to the old object, so the new and old objects should have overlapping lifetimes, forbidding equal id
values. With the optimization in place, the string after the +=
has the same ID as the string before the +=
.
The language developers decided they cared more about people who would put string concatenation in a loop, see bad performance, and assume Python sucks, than they cared about this obscure technical point.
Somewhat of a guesswork here - when the GC runs, it's allowed to compact/reorganize the memory. By doing so, it's well within its right to reuse old addresses as long as they are now free. By calling a+="h"
you've created a new immutable string, but lost the reference to the string a
previously pointed to. This string becomes eligible for garbage collection, meaning the old address it used to occupy can be reused.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With