I'm trying to understand the write() and read() methods of io.BytesIO. My understanding was that I could use the io.BytesIO as I would use a File object.
import io
in_memory = io.BytesIO(b'hello')
print( in_memory.read() )
The above code will return b'hello' as expected, but the code below will return an empty string b''.
import io
in_memory = io.BytesIO(b'hello')
in_memory.write(b' world')
print( in_memory.read() )
My questions are:
-What is io.BytesIO.write(b' world')
doing exactly?
-What is the difference between io.BytesIO.read() and io.BytesIO.getvalue()?
I assume that the answer is related to io.BytesIO being a stream object, but the big picture is not clear to me.
It takes input POSIX based arguments and returns a file descriptor which represents the opened file. It does not return a file object; the returned value will not have read() or write() functions. Overall, io. open() function is just a wrapper over os.
getvalue() just returns the entire contents of the stream regardless of current position.
Besides the performance gain, using BytesIO instead of concatenating has the advantage that BytesIO can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.
The issue is that you are positioned at the end of the stream. Think of the position like a cursor. Once you have written b' world'
, your cursor is at the end of the stream. When you try to .read()
, you are reading everything after the position of the cursor - which is nothing, so you get the empty bytestring.
To navigate around the stream you can use the .seek
method:
>>> import io
>>> in_memory = io.BytesIO(b'hello', )
>>> in_memory.write(b' world')
>>> in_memory.seek(0) # go to the start of the stream
>>> print(in_memory.read())
b' world'
Note that, just like a filestream in write
('w'
) mode, the initial bytes b'hello'
have been overwritten by your writing of b' world'
.
.getvalue()
just returns the entire contents of the stream regardless of current position.
this is a memory stream but still a stream. The position is stored, so like any other stream if you try to read after having written, you have to re-position:
import io
in_memory = io.BytesIO(b'hello')
in_memory.seek(0,2) # seek to end, else we overwrite
in_memory.write(b' world')
in_memory.seek(0) # seek to start
print( in_memory.read() )
prints:
b'hello world'
while in_memory.getvalue()
doesn't need the final seek(0)
as it returns the contents of the stream from position 0.
BytesIO
does behave like a file, only one that you can both read and write. The confusing part, maybe, is that the reading and writing position is the same one. So first you do:
in_memory = io.BytesIO(b'hello')
This gives you a bytes buffer in in_memory
with the contents b'hello'
and with the read/write position at the beginning (before the first b'h'
). When you do:
in_memory.write(b' world')
You are effectively overwriting b'hello'
with b' world'
(and actually getting one byte further), and now you have the position at the end (after the last b'd'
). So when you do:
print( in_memory.read() )
You see nothing because there is nothing to read after the current position. You can, however, use seek
to move the position, so if you do
import io
in_memory = io.BytesIO(b'hello')
in_memory.write(b' world')
in_memory.seek(0)
print( in_memory.read() )
You get:
b' world'
Note that you do not see the initial b'hello'
because it was overwritten. If you want to write after the initial content, you can first seek to the end:
import io
in_memory = io.BytesIO(b'hello')
in_memory.seek(0, 2)
in_memory.write(b' world')
in_memory.seek(0)
print( in_memory.read() )
Output:
b'hello world'
EDIT: About getvalue
, as pointed out by other answers, it gives you the full internal buffer, independently of the current position. This operation is obviously not available for files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With