I'm wondering if : <pre class="prettyprint"><code>a = "abcdef" b = "def" if a[3:] == b: print("something") </code></pre> does actually perform a copy of the "def" part of <code>a</code> somewhere in memory, or if the letters checking is done in-place ? Note : I'm speaking about a string, not a list (for which I know the answer)

String slicing makes a copy in CPython. Looking in the source, this operation is handled in <code>unicodeobject.c:unicode_subscript</code>. There is evidently a special-case to re-use memory when the step is 1, start is 0, and the entire content of the string is sliced - this goes into <code>unicode_result_unchanged</code> and there will not be a copy. However, the general case calls <code>PyUnicode_Substring</code> where all roads lead to a <code>memcpy</code>. To empirically verify these claims, you can use a stdlib memory profiling tool <code>tracemalloc</code>: <pre class="prettyprint"><code># s.py import tracemalloc tracemalloc.start() before = tracemalloc.take_snapshot() a = "." * 7 * 1024**2 # 7 MB of ..... # line 6, first alloc b = a[1:] # line 7, second alloc after = tracemalloc.take_snapshot() for stat in after.compare_to(before, 'lineno')[:2]: print(stat) </code></pre> You should see the top two statistics output like this: <pre class="prettyprint"><code>/tmp/s.py:6: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB /tmp/s.py:7: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB </code></pre> This result shows two allocations of 7 meg, strong evidence of the memory copying, and the exact line numbers of those allocations will be indicated. Try changing the slice from <code>b = a[1:]</code> into <code>b = a[0:]</code> to see that entire-string-special-case in effect: there should be only one large allocation now, and <code>sys.getrefcount(a)</code> will increase by one. In theory, since strings are immutable, an implementation could re-use memory for substring slices. This would likely complicate any reference-counting based garbage collection process, so it may not be a useful idea in practice. Consider the case where a small slice from a much larger string was taken - unless you implemented some kind of sub-reference counting on the slice, the memory from the much larger string could not be freed until the end of the substring's lifetime. For users that specifically need a standard type which can be sliced without copying the underlying data, there is <code>memoryview</code>. See What exactly is the point of memoryview in Python for more information about that.

Possible talking point (feel free to edit adding information). I have just written this test to verify empirically what the answer to the question might be (this cannot and does not want to be a certain answer). <pre class="prettyprint"><code>import sys a = "abcdefg" print("a id:", id(a)) print("a[2:] id:", id(a[2:])) print("a[2:] is a:", a[2:] is a) print("Empty string memory size:", sys.getsizeof("")) print("a memory size:", sys.getsizeof(a)) print("a[2:] memory size:", sys.getsizeof(a[2:])) </code></pre> Output: <pre class="prettyprint"><code>a id: 139796109961712 a[2:] id: 139796109962160 a[2:] is a: False Empty string memory size: 49 a memory size: 56 a[2:] memory size: 54 </code></pre> As we can see here: <ul> <li>the size of an empty string object is 49 bytes</li> <li>a single character occupies 1 byte (Latin-1 encoding)</li> <li> <code>a</code> and <code>a[2:]</code> ids are different</li> <li>the occupied memory of each <code>a</code> and <code>a[2:]</code> is consistent with the memory occupied by a string with that number of char assigned</li> </ul>

Does string slicing perform copy in memory? [duplicate]

Tags:

python

python-3.x

I'm wondering if :

a = "abcdef" b = "def" if a[3:] == b:     print("something")

does actually perform a copy of the "def" part of a somewhere in memory, or if the letters checking is done in-place ?

Note : I'm speaking about a string, not a list (for which I know the answer)

743

asked Nov 17 '20 08:11

Fred

2 Answers

String slicing makes a copy in CPython.

Looking in the source, this operation is handled in unicodeobject.c:unicode_subscript. There is evidently a special-case to re-use memory when the step is 1, start is 0, and the entire content of the string is sliced - this goes into unicode_result_unchanged and there will not be a copy. However, the general case calls PyUnicode_Substring where all roads lead to a memcpy.

To empirically verify these claims, you can use a stdlib memory profiling tool tracemalloc:

# s.py import tracemalloc  tracemalloc.start() before = tracemalloc.take_snapshot() a = "." * 7 * 1024**2  # 7 MB of .....   # line 6, first alloc b = a[1:]                                # line 7, second alloc after = tracemalloc.take_snapshot()  for stat in after.compare_to(before, 'lineno')[:2]:     print(stat)

You should see the top two statistics output like this:

/tmp/s.py:6: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB /tmp/s.py:7: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB

This result shows two allocations of 7 meg, strong evidence of the memory copying, and the exact line numbers of those allocations will be indicated.

Try changing the slice from b = a[1:] into b = a[0:] to see that entire-string-special-case in effect: there should be only one large allocation now, and sys.getrefcount(a) will increase by one.

In theory, since strings are immutable, an implementation could re-use memory for substring slices. This would likely complicate any reference-counting based garbage collection process, so it may not be a useful idea in practice. Consider the case where a small slice from a much larger string was taken - unless you implemented some kind of sub-reference counting on the slice, the memory from the much larger string could not be freed until the end of the substring's lifetime.

For users that specifically need a standard type which can be sliced without copying the underlying data, there is memoryview. See What exactly is the point of memoryview in Python for more information about that.

181

answered Sep 22 '22 05:09

wim

Possible talking point (feel free to edit adding information).

I have just written this test to verify empirically what the answer to the question might be (this cannot and does not want to be a certain answer).

import sys  a = "abcdefg"  print("a id:", id(a)) print("a[2:] id:", id(a[2:])) print("a[2:] is a:", a[2:] is a)  print("Empty string memory size:", sys.getsizeof("")) print("a memory size:", sys.getsizeof(a)) print("a[2:] memory size:", sys.getsizeof(a[2:]))

Output:

a id: 139796109961712 a[2:] id: 139796109962160 a[2:] is a: False Empty string memory size: 49 a memory size: 56 a[2:] memory size: 54

As we can see here:

the size of an empty string object is 49 bytes
a single character occupies 1 byte (Latin-1 encoding)
a and a[2:] ids are different
the occupied memory of each a and a[2:] is consistent with the memory occupied by a string with that number of char assigned

answered Sep 19 '22 05:09

lorenzozane

Related questions
                            
                                Locating the centroid (center of mass) of spherical polygons
                            
                                Why is there no list.clear() method in python?
                            
                                Python Music Library? [closed]
                            
                                How do I call a Javascript function from Python?
                            
                                Is it wrong to use the "==" operator when comparing to an empty list? [duplicate]
                            
                                When should I ever use file.read() or file.readlines()?
                            
                                How do I set up a daemon with python-daemon?
                            
                                How does keras define "accuracy" and "loss"?
                            
                                Pandas add column with value based on condition based on other columns
                            
                                How to de-import a Python module?
                            
                                Should 3.4 enums use UPPER_CASE_WITH_UNDERSCORES?
                            
                                Can json.loads ignore trailing commas?
                            
                                Python : terminology 'class' VS 'type'
                            
                                Is django prefetch_related supposed to work with GenericRelation
                            
                                Why is Python 3 is considerably slower than Python 2? [duplicate]
                            
                                Performance of Redis vs Disk in caching application
                            
                                What is the global default timeout
                            
                                What Kivy Tutorials Are Available [closed]
                            
                                Is there a way to access the original function in a mocked method/function such that I can modify the arguments and pass it to the original functions?
                            
                                How can I print the values of Keras tensors?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With