Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String immutability in CPython violated

This is more of an 'interesting' phenomena I encountered in a Python module that I'm trying to understand, rather than a request for help (though a solution would also be useful).

>>> import fuzzy
>>> s = fuzzy.Soundex(4)
>>> a = "apple"
>>> b = a
>>> sdx_a = s(a)
>>> sdx_a
'A140'
>>> a
'APPLE'
>>> b
'APPLE'

Yeah, so the fuzzy module totally violates the immutability of strings in Python. Is it able to do this because it is a C-extension? And does this constitute an error in CPython as well as the module, or even a security risk?

Also, can anyone think of a way to get around this behaviour? I would like to be able to keep the original capitalisation of the string.

Cheers,

Alex

like image 507
Alex Avatar asked Apr 30 '12 03:04

Alex


People also ask

Does using the += operator to concatenate strings violate Python's string immutability Why or why not?

It violates the rules of how ID values and += are supposed to work - the ID values produced with the optimization in place would be not only impossible, but prohibited, with the unoptimized semantics - but the developers care more about people who would see bad concatenation performance and assume Python sucks.

How do you prove a string is immutable in Python?

In python, the string data types are immutable. Which means a string value cannot be updated. We can verify this by trying to update a part of the string which will led us to an error. We can further verify this by checking the memory location address of the position of the letters of the string.

Does assigning a value to a string indexed character violate Python string immutability?

In Python, a string is immutable. You cannot overwrite the values of immutable objects. However, you can assign the variable again. It's not modifying the string object; it's creating a new string object.

Are strings immutable are strings ordered can we slice strings?

The strings in Python are immutable and support the buffer interface. It could be efficient to return not the new strings, but the buffers pointing to the parts of the old string when using slices or the . split() method. However, a new string object is constructed each time.


3 Answers

This bug was resolved back in February; update your version.

To answer your question, yes, there are several ways to modify immutable types at the C level. The security implications are unknown, and possibly even unknowable, at this point.

like image 60
Ignacio Vazquez-Abrams Avatar answered Sep 29 '22 13:09

Ignacio Vazquez-Abrams


I don't have the fuzzy module available to test right now, but the following creates a string with a new identity:

>>> a = "hello"
>>> b = ''.join(a)
>>> b
'hello'
>>> id(a), id(b)
(182894286096, 182894559280)
like image 38
Greg Hewgill Avatar answered Sep 30 '22 13:09

Greg Hewgill


I don't know much about CPython, but it looks like in fuzzy.c it declares char *cs = s, where s is the input to __call__. It then mutates cs[i], which will obviously mutate s[i] and therefore the original string. This is definitely a bug with Fuzzy and you should file it on the bitbucket. As Greg's answer said, using ''.join(a) will create a new copy.

like image 42
Venge Avatar answered Oct 01 '22 13:10

Venge