Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

slices to immutable strings by reference and not copy

Tags:

python

If you use string.split() on a Python string, it returns a list of strings. These substrings that have been split-out are copies of their part of the parent string.

Is it possible to instead get some cheaper slice object that holds only a reference, offset and length to the bits split out?

And is it possible to have some 'string view' to extract and treat these sub-strings as if they are strings yet without making a copy of their bytes?

(I ask as I have very large strings I want to slice and am running out of memory occasionally; removing the copies would be a cheap profile-guided win.)

like image 679
Will Avatar asked Apr 10 '12 08:04

Will


2 Answers

buffer will give you a read-only view on a string.

>>> s = 'abcdefghijklmnopqrstuvwxyz'
>>> b = buffer(s, 2, 10)
>>> b
<read-only buffer for 0x7f935ee75d70, size 10, offset 2 at 0x7f935ee5a8f0>
>>> b[:]
'cdefghijkl'
like image 180
Ignacio Vazquez-Abrams Avatar answered Oct 21 '22 15:10

Ignacio Vazquez-Abrams


String objects always point to a NUL-terminated buffer in Python, so substrings must be copied. As Ignacio pointed out, you can use buffer() to get a read-only view on the string memory. The buffer() built-in function has been superseded by the more versatile memoryview objects, though, which are available in Python 2.7 and 3.x (buffer() is gone in Python 3.x).

s = "abcd" * 50
view = memoryview(s)
subview = view[10:20]
print subview.tobytes()

This code prints

cdabcdabcd

As soon as you call tobytes(), a copy of the string will be created, but the same happens when slicing the old buffer objects as in Ignacio's answer.

like image 44
Sven Marnach Avatar answered Oct 21 '22 13:10

Sven Marnach