Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How the write(), read() and getvalue() methods of Python io.BytesIO work?

Tags:

python

bytesio

I'm trying to understand the write() and read() methods of io.BytesIO. My understanding was that I could use the io.BytesIO as I would use a File object.

import io
in_memory = io.BytesIO(b'hello')
print( in_memory.read() )

The above code will return b'hello' as expected, but the code below will return an empty string b''.

import io
in_memory = io.BytesIO(b'hello')
in_memory.write(b' world')
print( in_memory.read() )

My questions are:

-What is io.BytesIO.write(b' world') doing exactly?

-What is the difference between io.BytesIO.read() and io.BytesIO.getvalue()?

I assume that the answer is related to io.BytesIO being a stream object, but the big picture is not clear to me.

like image 773
Robert Avatar asked Nov 26 '18 16:11

Robert


People also ask

What does io BytesIO () do?

It takes input POSIX based arguments and returns a file descriptor which represents the opened file. It does not return a file object; the returned value will not have read() or write() functions. Overall, io. open() function is just a wrapper over os.

What is Getvalue in Python?

getvalue() just returns the entire contents of the stream regardless of current position.

Why do we use BytesIO?

Besides the performance gain, using BytesIO instead of concatenating has the advantage that BytesIO can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.


3 Answers

The issue is that you are positioned at the end of the stream. Think of the position like a cursor. Once you have written b' world', your cursor is at the end of the stream. When you try to .read(), you are reading everything after the position of the cursor - which is nothing, so you get the empty bytestring.

To navigate around the stream you can use the .seek method:

>>> import io
>>> in_memory = io.BytesIO(b'hello', )
>>> in_memory.write(b' world')
>>> in_memory.seek(0)  # go to the start of the stream
>>> print(in_memory.read())
b' world'

Note that, just like a filestream in write ('w') mode, the initial bytes b'hello' have been overwritten by your writing of b' world'.

.getvalue() just returns the entire contents of the stream regardless of current position.

like image 167
johnpaton Avatar answered Oct 03 '22 18:10

johnpaton


this is a memory stream but still a stream. The position is stored, so like any other stream if you try to read after having written, you have to re-position:

import io
in_memory = io.BytesIO(b'hello')
in_memory.seek(0,2)   # seek to end, else we overwrite
in_memory.write(b' world')
in_memory.seek(0)    # seek to start
print( in_memory.read() )

prints:

b'hello world'

while in_memory.getvalue() doesn't need the final seek(0) as it returns the contents of the stream from position 0.

like image 26
Jean-François Fabre Avatar answered Oct 02 '22 18:10

Jean-François Fabre


BytesIO does behave like a file, only one that you can both read and write. The confusing part, maybe, is that the reading and writing position is the same one. So first you do:

in_memory = io.BytesIO(b'hello')

This gives you a bytes buffer in in_memory with the contents b'hello' and with the read/write position at the beginning (before the first b'h'). When you do:

in_memory.write(b' world')

You are effectively overwriting b'hello' with b' world' (and actually getting one byte further), and now you have the position at the end (after the last b'd'). So when you do:

print( in_memory.read() )

You see nothing because there is nothing to read after the current position. You can, however, use seek to move the position, so if you do

import io
in_memory = io.BytesIO(b'hello')
in_memory.write(b' world')
in_memory.seek(0)
print( in_memory.read() )

You get:

b' world'

Note that you do not see the initial b'hello' because it was overwritten. If you want to write after the initial content, you can first seek to the end:

import io
in_memory = io.BytesIO(b'hello')
in_memory.seek(0, 2)
in_memory.write(b' world')
in_memory.seek(0)
print( in_memory.read() )

Output:

b'hello world'

EDIT: About getvalue, as pointed out by other answers, it gives you the full internal buffer, independently of the current position. This operation is obviously not available for files.

like image 6
jdehesa Avatar answered Oct 02 '22 18:10

jdehesa