If you use string.split()
on a Python string, it returns a list of strings. These substrings that have been split-out are copies of their part of the parent string.
Is it possible to instead get some cheaper slice object that holds only a reference, offset and length to the bits split out?
And is it possible to have some 'string view' to extract and treat these sub-strings as if they are strings yet without making a copy of their bytes?
(I ask as I have very large strings I want to slice and am running out of memory occasionally; removing the copies would be a cheap profile-guided win.)
buffer
will give you a read-only view on a string.
>>> s = 'abcdefghijklmnopqrstuvwxyz'
>>> b = buffer(s, 2, 10)
>>> b
<read-only buffer for 0x7f935ee75d70, size 10, offset 2 at 0x7f935ee5a8f0>
>>> b[:]
'cdefghijkl'
String objects always point to a NUL-terminated buffer in Python, so substrings must be copied. As Ignacio pointed out, you can use buffer()
to get a read-only view on the string memory. The buffer()
built-in function has been superseded by the more versatile memoryview
objects, though, which are available in Python 2.7 and 3.x (buffer()
is gone in Python 3.x).
s = "abcd" * 50
view = memoryview(s)
subview = view[10:20]
print subview.tobytes()
This code prints
cdabcdabcd
As soon as you call tobytes()
, a copy of the string will be created, but the same happens when slicing the old buffer
objects as in Ignacio's answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With