I'm wondering if :
a = "abcdef" b = "def" if a[3:] == b: print("something")
does actually perform a copy of the "def" part of a
somewhere in memory, or if the letters checking is done in-place ?
Note : I'm speaking about a string, not a list (for which I know the answer)
The short answer. Slicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.
Python does slice-by-copy, meaning every time you slice (except for very trivial slices, such as a[:] ), it copies all of the data into a new string object. The [slice-by-reference] approach is more complicated, harder to implement and may lead to unexpected behavior.
There are multiple ways by which string slicing in python can allow us to reverse a string. Slicing is one of the ways that allows us to reverse a string.
All slices, like x_list[1:4] and the empty slice, are shallow copies. This makes sense; slicing operations just take part of an existing list, so it would be inefficient to create a deep copy and duplicate existing values. Slicing is just a command as to how to display data from the original list.
String slicing makes a copy in CPython.
Looking in the source, this operation is handled in unicodeobject.c:unicode_subscript
. There is evidently a special-case to re-use memory when the step is 1, start is 0, and the entire content of the string is sliced - this goes into unicode_result_unchanged
and there will not be a copy. However, the general case calls PyUnicode_Substring
where all roads lead to a memcpy
.
To empirically verify these claims, you can use a stdlib memory profiling tool tracemalloc
:
# s.py import tracemalloc tracemalloc.start() before = tracemalloc.take_snapshot() a = "." * 7 * 1024**2 # 7 MB of ..... # line 6, first alloc b = a[1:] # line 7, second alloc after = tracemalloc.take_snapshot() for stat in after.compare_to(before, 'lineno')[:2]: print(stat)
You should see the top two statistics output like this:
/tmp/s.py:6: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB /tmp/s.py:7: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB
This result shows two allocations of 7 meg, strong evidence of the memory copying, and the exact line numbers of those allocations will be indicated.
Try changing the slice from b = a[1:]
into b = a[0:]
to see that entire-string-special-case in effect: there should be only one large allocation now, and sys.getrefcount(a)
will increase by one.
In theory, since strings are immutable, an implementation could re-use memory for substring slices. This would likely complicate any reference-counting based garbage collection process, so it may not be a useful idea in practice. Consider the case where a small slice from a much larger string was taken - unless you implemented some kind of sub-reference counting on the slice, the memory from the much larger string could not be freed until the end of the substring's lifetime.
For users that specifically need a standard type which can be sliced without copying the underlying data, there is memoryview
. See What exactly is the point of memoryview in Python for more information about that.
Possible talking point (feel free to edit adding information).
I have just written this test to verify empirically what the answer to the question might be (this cannot and does not want to be a certain answer).
import sys a = "abcdefg" print("a id:", id(a)) print("a[2:] id:", id(a[2:])) print("a[2:] is a:", a[2:] is a) print("Empty string memory size:", sys.getsizeof("")) print("a memory size:", sys.getsizeof(a)) print("a[2:] memory size:", sys.getsizeof(a[2:]))
Output:
a id: 139796109961712 a[2:] id: 139796109962160 a[2:] is a: False Empty string memory size: 49 a memory size: 56 a[2:] memory size: 54
As we can see here:
a
and a[2:]
ids are differenta
and a[2:]
is consistent with the memory occupied by a string with that number of char assignedIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With