Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python bytes object from generator

Let's say I have a generator like

gen = (i*2 for i in range(100))

and I now want to create a bytes object containing all the values that generator yields. I could do the following:

b = bytes(gen)

My question now is: since bytes objects are immutable, how does the memory allocation work in this case? Do I have to assume that for every element the generator yields, there is a new bytes object created, with the previous content plus another element copied into it? This would be very inefficient especially for generators of bigger lenghts. And since the generator does not provide any length information, it seems there wouldn't be any other way of pre-allocating the needed memory internally.

Then again, what would be a better way to achieve this, with as few as possible memory usage? If I used a (mutable) bytearray first and casted that into a bytes object?

b = bytes(bytearray(gen))

Or even a list?

b = bytes(list(gen))

But that looks somehow strange and counter-intuitive...


Background: The specific generator I have reads bytes (as Python integers in 0..255) one at a time over a C-API from another module (.pyd), and the overall length of the sequence is already known beforehand, with up to 2**25 bytes in there. My readout function should collect those and return a bytes object, which I thought was appropriate, since the data is read only.

like image 974
Jeronimo Avatar asked Sep 20 '17 09:09

Jeronimo


1 Answers

bytes(iterator) create bytes object from iterator using internal C-API _PyBytes_FromIterator function, which use special _PyBytes_Writer protocol. It internaly use a buffer, which resize when it overflows using a rule:

bufsize += bufsize  / OVERALLOCATE_FACTOR

For linux OVERALLOCATE_FACTOR=4, for windows OVERALLOCATE_FACTOR=2.

Those. This process looks like writing to a file in RAM. At the end, the contents of the buffer returns.

like image 104
intellimath Avatar answered Oct 02 '22 18:10

intellimath