Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python str view

I have a huge str of ~1GB in length:

>>> len(L)
1073741824

I need to take many pieces of the string from specific indexes until the end of the string. In C I'd do:

char* L = ...;
char* p1 = L + start1;
char* p2 = L + start2;
...

But in Python, slicing a string creates a new str instance using more memory:

>>> id(L)
140613333131280
>>> p1 = L[10:]
>>> id(p1)
140612259385360

To save memory, how do I create an str-like object that is in fact a pointer to the original L?

Edit: we have buffer and memoryview in Python 2 and Python 3, but memoryview does not exhibit the same interface as an str or bytes:

>>> L = b"0" * 1000
>>> a = memoryview(L)
>>> b = memoryview(L)
>>> a < b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: memoryview() < memoryview()

>>> type(b'')
<class 'bytes'>
>>> b'' < b''
False
>>> b'0' < b'1'
True
like image 366
vz0 Avatar asked Nov 19 '14 09:11

vz0


People also ask

What is the STR () in Python?

The str() function converts values to a string form so they can be combined with other strings. The "print" function normally prints out one or more python items followed by a newline.

What does str () return in Python?

Python str() function returns the string version of the object. Parameters: object: The object whose string representation is to be returned.

What is the use of __ str __ in Python?

Python __str__() This method returns the string representation of the object. This method is called when print() or str() function is invoked on an object. This method must return the String object.

How do you access the elements of a string in Python?

Individual characters in a string can be accessed by specifying the string name followed by a number in square brackets ( [] ). String indexing in Python is zero-based: the first character in the string has index 0 , the next has index 1 , and so on.


2 Answers

There is a memoryview type:

>>> v = memoryview('potato')
>>> v[2]
't'
>>> v[-1]
'o'
>>> v[1:4]
<memory at 0x7ff0876fb808>
>>> v[1:4].tobytes()
'ota'
like image 161
wim Avatar answered Oct 19 '22 13:10

wim


If you need to work on a string, use iterators to actually access the data without duplicating the content in memory

Your tool of trade would be itertools.tee and itertools.islice

>>> L = "Random String of data"
>>> p1, p2 = tee(L)
>>> p1 = islice(p1,10,None)
>>> p2 = islice(p2,15,None)
>>> ''.join(p1) # This now creates a copy now
'ing of data'
>>> ''.join(p2) # This now creates a copy now
'f data'

This in literal sense yield a pointer, unlike in C/C++, it is just a forward pointer/iterator

Note Off-course you need to take due diligence in using the forward iterators namely

  1. To save the pointer before advancing. itertools.tee would be useful here as in p1, p_saved = tee(p1)
  2. You can read as a character next(p1) or as a string ''.join(p1), but because python string is not mutable, every time you need a string view, you would be presented as a copy.
  3. As you can read as a single characters, all your algorithms should leverage the iterable capabilities rather than generating the string. For example to compare two itertors, instead of comparing the content ''.join(p1) == ''.join(p2), you need to do the following all(a == b for a, b in izip(p1, p2))
like image 37
Abhijit Avatar answered Oct 19 '22 11:10

Abhijit