I noticed that in python, string object keeps only one copy. Like below code:
>>> s1='abcde'
>>> s2='abcde'
>>> s1 is s2
True
s1 and s2 point to the same object.
When I edit s1, s2 still keeps the object ('abcde'), but the s1 points to a new copy. This behavior likes copy on write.
>>> s1=s1+'f'
>>> s1 is s2
False
>>> s1
'abcdef'
>>> s2
'abcde'
So does python really use the copy on write mechanisim on string object?
Python has support for shallow copying and deep copying functionality via its copy module. However it does not provide for copy-on-write semantics.
In Python, strings are immutable, meaning that their value cannot change over the course of the program. Being immutable also means that a string cannot directly have a copy. If a new variable is declared and is directly assigned the value of a given string variable, this would not create a copy of the original string.
Python does slice-by-copy, meaning every time you slice (except for very trivial slices, such as a[:] ), it copies all of the data into a new string object. The [slice-by-reference] approach is more complicated, harder to implement and may lead to unexpected behavior.
To make a copy of a string, we can use the built-in slice syntax [:] in Python. Similarly, we can also do it by assigning a string to the new variable. or we can use the str() function to create a string copy.
No copying is taking place in any relevant sense. Your new string is an entirely new string object. It is no different than if you had done s1 = 'abcdef'
. Some kinds of objects in Python allow you to modify them "in-place", but not strings. (In Python parlance, strings are immutable.)
Note that the fact that your two original strings are the same object is due to an implementation-specific optimization and will not always be true:
>>> s1 = 'this is a longer string than yours'
>>> s2 = 'this is a longer string than yours'
>>> s1 is s2
False
yes; both s1
and s2
will point to same object; because they are interned(based on some rules
);
In [73]: s1='abcde'
In [74]: s2='abcde'
In [75]: id(s1), id(s2), s1 is s2
Out[75]: (63060096, 63060096, True)
like one rule is; you are only allowed ascii letters, digits or underscores;
In [77]: s1='abcde!'
In [78]: s2='abcde!'
In [79]: id(s1), id(s2), s1 is s2
Out[79]: (84722496, 84722368, False)
also; interesting thing is by default all 0 and length 1 strings are interned;
In [80]: s1 = "_"
In [81]: s2 = "_"
In [82]: id(s1), id(s2), s1 is s2
Out[82]: (8144656, 8144656, True)
In [83]: s1 = "!"
In [84]: s2 = "!"
In [85]: id(s1), id(s2), s1 is s2
Out[85]: (8849888, 8849888, True)
if i will produce my string at runtime; it won't be interned;
In [86]: s1 = "abcde"
In [87]: s2 = "".join(['a', 'b', 'c', 'd', 'e'])
In [88]: id(s1), id(s2), s1 is s2
Out[88]: (84722944, 84723648, False)
"...during peephole optimization is called constant folding and consists in simplifying constant expressions in advance"
(from this link)
and these expression based on above rules will be interned
In [91]: 'abc' +'de' is 'abcde'
Out[91]: True
In [92]: def foo():
...: print "abc" + 'de'
...:
In [93]: def foo1():
...: print "abcde"
...:
In [94]: dis.dis(foo)
2 0 LOAD_CONST 3 ('abcde')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
In [95]: dis.dis(foo1)
2 0 LOAD_CONST 1 ('abcde')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
and that with the length less than equal to 20;
In [96]: "a" * 20 is 'aaaaaaaaaaaaaaaaaaaa'
Out[96]: True
In [97]: 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'
Out[97]: False
and its all because python strings are immutable; you can't edit them;
In [98]: s1 = "abcde"
In [99]: s1[2] = "C"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-99-1d7c49892017> in <module>()
----> 1 s1[2] = "C"
TypeError: 'str' object does not support item assignment
Python provides intern
Built-in Function; in python 3.x it is in sys
module;
In [100]: s1 = 'this is a longer string than yours'
In [101]: s2 = 'this is a longer string than yours'
In [102]: id(s1), id(s2), s1 is s2
Out[102]: (84717088, 84717032, False)
In [103]: s1 = intern('this is a longer string than yours')
In [104]: s2 = intern('this is a longer string than yours')
In [105]: id(s1), id(s2), s1 is s2
Out[105]: (84717424, 84717424, True)
You can read more at below given links:
http://guilload.com/python-string-interning/
Does Python intern strings?
It is creating a new string object in and of itself!
s1=s1+'f'
is no different to:
s1 = 'abcdef'
Note that this can slow down your program significantly if you're appending multiple times to a string (because you are really creating multiple strings). This is a known anti-pattern since every concatenation creates a new string. This results in O(N^2) running time
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With