Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does string slicing perform copy in memory? [duplicate]

I'm wondering if :

a = "abcdef" b = "def" if a[3:] == b:     print("something") 

does actually perform a copy of the "def" part of a somewhere in memory, or if the letters checking is done in-place ?

Note : I'm speaking about a string, not a list (for which I know the answer)

like image 743
Fred Avatar asked Nov 17 '20 08:11

Fred


People also ask

Does slicing make a copy?

The short answer. Slicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.

Does python string slice make a copy?

Python does slice-by-copy, meaning every time you slice (except for very trivial slices, such as a[:] ), it copies all of the data into a new string object. The [slice-by-reference] approach is more complicated, harder to implement and may lead to unexpected behavior.

What is the purpose of string slicing?

There are multiple ways by which string slicing in python can allow us to reverse a string. Slicing is one of the ways that allows us to reverse a string.

Does slice create a deep copy?

All slices, like x_list[1:4] and the empty slice, are shallow copies. This makes sense; slicing operations just take part of an existing list, so it would be inefficient to create a deep copy and duplicate existing values. Slicing is just a command as to how to display data from the original list.


2 Answers

String slicing makes a copy in CPython.

Looking in the source, this operation is handled in unicodeobject.c:unicode_subscript. There is evidently a special-case to re-use memory when the step is 1, start is 0, and the entire content of the string is sliced - this goes into unicode_result_unchanged and there will not be a copy. However, the general case calls PyUnicode_Substring where all roads lead to a memcpy.

To empirically verify these claims, you can use a stdlib memory profiling tool tracemalloc:

# s.py import tracemalloc  tracemalloc.start() before = tracemalloc.take_snapshot() a = "." * 7 * 1024**2  # 7 MB of .....   # line 6, first alloc b = a[1:]                                # line 7, second alloc after = tracemalloc.take_snapshot()  for stat in after.compare_to(before, 'lineno')[:2]:     print(stat) 

You should see the top two statistics output like this:

/tmp/s.py:6: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB /tmp/s.py:7: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB 

This result shows two allocations of 7 meg, strong evidence of the memory copying, and the exact line numbers of those allocations will be indicated.

Try changing the slice from b = a[1:] into b = a[0:] to see that entire-string-special-case in effect: there should be only one large allocation now, and sys.getrefcount(a) will increase by one.

In theory, since strings are immutable, an implementation could re-use memory for substring slices. This would likely complicate any reference-counting based garbage collection process, so it may not be a useful idea in practice. Consider the case where a small slice from a much larger string was taken - unless you implemented some kind of sub-reference counting on the slice, the memory from the much larger string could not be freed until the end of the substring's lifetime.

For users that specifically need a standard type which can be sliced without copying the underlying data, there is memoryview. See What exactly is the point of memoryview in Python for more information about that.

like image 181
wim Avatar answered Sep 22 '22 05:09

wim


Possible talking point (feel free to edit adding information).

I have just written this test to verify empirically what the answer to the question might be (this cannot and does not want to be a certain answer).

import sys  a = "abcdefg"  print("a id:", id(a)) print("a[2:] id:", id(a[2:])) print("a[2:] is a:", a[2:] is a)  print("Empty string memory size:", sys.getsizeof("")) print("a memory size:", sys.getsizeof(a)) print("a[2:] memory size:", sys.getsizeof(a[2:])) 

Output:

a id: 139796109961712 a[2:] id: 139796109962160 a[2:] is a: False Empty string memory size: 49 a memory size: 56 a[2:] memory size: 54 

As we can see here:

  • the size of an empty string object is 49 bytes
  • a single character occupies 1 byte (Latin-1 encoding)
  • a and a[2:] ids are different
  • the occupied memory of each a and a[2:] is consistent with the memory occupied by a string with that number of char assigned
like image 31
lorenzozane Avatar answered Sep 19 '22 05:09

lorenzozane