Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Python manipulate string object as copy on write style

I noticed that in python, string object keeps only one copy. Like below code:

>>> s1='abcde'
>>> s2='abcde'
>>> s1 is s2
True

s1 and s2 point to the same object.

When I edit s1, s2 still keeps the object ('abcde'), but the s1 points to a new copy. This behavior likes copy on write.

>>> s1=s1+'f'
>>> s1 is s2
False
>>> s1
'abcdef'
>>> s2
'abcde'

So does python really use the copy on write mechanisim on string object?

like image 831
roast_soul Avatar asked Feb 25 '15 05:02

roast_soul


People also ask

Is Python copy on write?

Python has support for shallow copying and deep copying functionality via its copy module. However it does not provide for copy-on-write semantics.

Are strings copied in Python?

In Python, strings are immutable, meaning that their value cannot change over the course of the program. Being immutable also means that a string cannot directly have a copy. If a new variable is declared and is directly assigned the value of a given string variable, this would not create a copy of the original string.

Does Python string slice make a copy?

Python does slice-by-copy, meaning every time you slice (except for very trivial slices, such as a[:] ), it copies all of the data into a new string object. The [slice-by-reference] approach is more complicated, harder to implement and may lead to unexpected behavior.

How do you copy an object from a string in Python?

To make a copy of a string, we can use the built-in slice syntax [:] in Python. Similarly, we can also do it by assigning a string to the new variable. or we can use the str() function to create a string copy.


3 Answers

No copying is taking place in any relevant sense. Your new string is an entirely new string object. It is no different than if you had done s1 = 'abcdef'. Some kinds of objects in Python allow you to modify them "in-place", but not strings. (In Python parlance, strings are immutable.)

Note that the fact that your two original strings are the same object is due to an implementation-specific optimization and will not always be true:

>>> s1 = 'this is a longer string than yours'
>>> s2 = 'this is a longer string than yours'
>>> s1 is s2
False
like image 172
BrenBarn Avatar answered Sep 30 '22 03:09

BrenBarn


yes; both s1 and s2 will point to same object; because they are interned(based on some rules);

In [73]: s1='abcde'

In [74]: s2='abcde'

In [75]: id(s1), id(s2), s1 is s2
Out[75]: (63060096, 63060096, True)

like one rule is; you are only allowed ascii letters, digits or underscores;

In [77]: s1='abcde!'

In [78]: s2='abcde!'

In [79]: id(s1), id(s2), s1 is s2
Out[79]: (84722496, 84722368, False)

also; interesting thing is by default all 0 and length 1 strings are interned;

In [80]: s1 = "_"

In [81]: s2 = "_"

In [82]: id(s1), id(s2), s1 is s2
Out[82]: (8144656, 8144656, True)

In [83]: s1 = "!"

In [84]: s2 = "!"

In [85]: id(s1), id(s2), s1 is s2
Out[85]: (8849888, 8849888, True)

if i will produce my string at runtime; it won't be interned;

In [86]: s1 = "abcde"

In [87]: s2 = "".join(['a', 'b', 'c', 'd', 'e'])

In [88]: id(s1), id(s2), s1 is s2
Out[88]: (84722944, 84723648, False)

"...during peephole optimization is called constant folding and consists in simplifying constant expressions in advance"(from this link) and these expression based on above rules will be interned

In [91]: 'abc' +'de' is 'abcde'
Out[91]: True

In [92]: def foo():
    ...:     print "abc" + 'de'
    ...:     

In [93]: def foo1():
    ...:     print "abcde"
    ...:     

In [94]: dis.dis(foo)
  2           0 LOAD_CONST               3 ('abcde')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE        

In [95]: dis.dis(foo1)
  2           0 LOAD_CONST               1 ('abcde')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE        

and that with the length less than equal to 20;

In [96]: "a" * 20 is 'aaaaaaaaaaaaaaaaaaaa'
Out[96]: True

In [97]: 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'
Out[97]: False

and its all because python strings are immutable; you can't edit them;

In [98]: s1 = "abcde"

In [99]: s1[2] = "C"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-99-1d7c49892017> in <module>()
----> 1 s1[2] = "C"

TypeError: 'str' object does not support item assignment

Python provides intern Built-in Function; in python 3.x it is in sys module;

In [100]: s1 = 'this is a longer string than yours'

In [101]: s2 = 'this is a longer string than yours'

In [102]: id(s1), id(s2), s1 is s2
Out[102]: (84717088, 84717032, False)

In [103]: s1 = intern('this is a longer string than yours')

In [104]: s2 = intern('this is a longer string than yours')

In [105]: id(s1), id(s2), s1 is s2
Out[105]: (84717424, 84717424, True)

You can read more at below given links:

http://guilload.com/python-string-interning/

Does Python intern strings?

like image 40
namit Avatar answered Sep 30 '22 04:09

namit


It is creating a new string object in and of itself!

s1=s1+'f'

is no different to:

s1 = 'abcdef'

Note that this can slow down your program significantly if you're appending multiple times to a string (because you are really creating multiple strings). This is a known anti-pattern since every concatenation creates a new string. This results in O(N^2) running time

like image 24
Secret Avatar answered Sep 30 '22 03:09

Secret