I have a huge str
of ~1GB in length:
>>> len(L)
1073741824
I need to take many pieces of the string from specific indexes until the end of the string. In C I'd do:
char* L = ...;
char* p1 = L + start1;
char* p2 = L + start2;
...
But in Python, slicing a string creates a new str
instance using more memory:
>>> id(L)
140613333131280
>>> p1 = L[10:]
>>> id(p1)
140612259385360
To save memory, how do I create an str-like object that is in fact a pointer to the original L?
Edit: we have buffer
and memoryview
in Python 2 and Python 3, but memoryview
does not exhibit the same interface as an str
or bytes
:
>>> L = b"0" * 1000
>>> a = memoryview(L)
>>> b = memoryview(L)
>>> a < b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: memoryview() < memoryview()
>>> type(b'')
<class 'bytes'>
>>> b'' < b''
False
>>> b'0' < b'1'
True
The str() function converts values to a string form so they can be combined with other strings. The "print" function normally prints out one or more python items followed by a newline.
Python str() function returns the string version of the object. Parameters: object: The object whose string representation is to be returned.
Python __str__() This method returns the string representation of the object. This method is called when print() or str() function is invoked on an object. This method must return the String object.
Individual characters in a string can be accessed by specifying the string name followed by a number in square brackets ( [] ). String indexing in Python is zero-based: the first character in the string has index 0 , the next has index 1 , and so on.
There is a memoryview
type:
>>> v = memoryview('potato')
>>> v[2]
't'
>>> v[-1]
'o'
>>> v[1:4]
<memory at 0x7ff0876fb808>
>>> v[1:4].tobytes()
'ota'
If you need to work on a string, use iterators to actually access the data without duplicating the content in memory
Your tool of trade would be itertools.tee and itertools.islice
>>> L = "Random String of data"
>>> p1, p2 = tee(L)
>>> p1 = islice(p1,10,None)
>>> p2 = islice(p2,15,None)
>>> ''.join(p1) # This now creates a copy now
'ing of data'
>>> ''.join(p2) # This now creates a copy now
'f data'
This in literal sense yield a pointer, unlike in C/C++, it is just a forward pointer/iterator
Note Off-course you need to take due diligence in using the forward iterators namely
itertools.tee
would be useful here as in p1, p_saved = tee(p1)
next(p1)
or as a string ''.join(p1)
, but because python string is not mutable, every time you need a string view, you would be presented as a copy.''.join(p1) == ''.join(p2)
, you need to do the following all(a == b for a, b in izip(p1, p2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With