I'm learning about working with streams in Python and I noticed that the IO docs say the following:
The easiest way to create a binary stream is with open() with 'b' in the mode string:
f = open("myfile.jpg", "rb")
In-memory binary streams are also available as BytesIO objects:
f = io.BytesIO(b"some initial binary data: \x00\x01")
What is the difference between f
as defined by open
and f
as defined by BytesIO
. In other words, what makes a "In-memory binary stream" and how is that different from what open
does?
Python io module allows us to manage the file-related input and output operations. The advantage of using the IO module is that the classes and functions available allows us to extend the functionality to enable writing to the Unicode data.
Binary I/O (also called buffered I/O) expects bytes-like objects and produces bytes objects. No encoding, decoding, or newline translation is performed. This category of streams can be used for all kinds of non-text data, and also when manual control over the handling of text data is desired.
close() is merely a convenience for routines that take a file-like and eventually attempt to close them. There is no need to do so yourself.
For simplicity's sake, let's consider writing instead of reading for now.
So when you use open()
like say:
with open("test.dat", "wb") as f: f.write(b"Hello World") f.write(b"Hello World") f.write(b"Hello World")
After executing that a file called test.dat
will be created, containing 3x Hello World
. The data wont be kept in memory after it's written to the file (unless being kept by a name).
Now when you consider io.BytesIO()
instead:
with io.BytesIO() as f: f.write(b"Hello World") f.write(b"Hello World") f.write(b"Hello World")
Which instead of writing the contents to a file, it's written to an in memory buffer. In other words a chunk of RAM. Essentially writing the following would be the equivalent:
buffer = b"" buffer += b"Hello World" buffer += b"Hello World" buffer += b"Hello World"
In relation to the example with the with statement, then at the end there would also be a del buffer
.
The key difference here is optimization and performance. io.BytesIO
is able to do some optimizations that makes it faster than simply concatenating all the b"Hello World"
one by one.
Just to prove it here's a small benchmark:
import io import time begin = time.time() buffer = b"" for i in range(0, 50000): buffer += b"Hello World" end = time.time() seconds = end - begin print("Concat:", seconds) begin = time.time() buffer = io.BytesIO() for i in range(0, 50000): buffer.write(b"Hello World") end = time.time() seconds = end - begin print("BytesIO:", seconds)
Besides the performance gain, using BytesIO
instead of concatenating has the advantage that BytesIO
can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.
The difference is that open("myfile.jpg", "rb")
simply loads and returns the contents of myfile.jpg
; whereas, BytesIO
again is just a buffer containing some data.
Since BytesIO
is just a buffer - if you wanted to write the contents to a file later - you'd have to do:
buffer = io.BytesIO() # ... with open("test.dat", "wb") as f: f.write(buffer.getvalue())
Also, you didn't mention a version; I'm using Python 3. Related to the examples: I'm using the with statement instead of calling f.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With