I'm learning about working with streams in Python and I noticed that the IO docs say the following: <blockquote> The easiest way to create a binary stream is with open() with 'b' in the mode string: <code>f = open("myfile.jpg", "rb")</code> In-memory binary streams are also available as BytesIO objects: <code>f = io.BytesIO(b"some initial binary data: \x00\x01")</code> </blockquote> What is the difference between <code>f</code> as defined by <code>open</code> and <code>f</code> as defined by <code>BytesIO</code>. In other words, what makes a "In-memory binary stream" and how is that different from what <code>open</code> does?

For simplicity's sake, let's consider writing instead of reading for now. So when you use <code>open()</code> like say: <pre class="prettyprint"><code>with open("test.dat", "wb") as f: f.write(b"Hello World") f.write(b"Hello World") f.write(b"Hello World") </code></pre> After executing that a file called <code>test.dat</code> will be created, containing 3x <code>Hello World</code>. The data wont be kept in memory after it's written to the file (unless being kept by a name). Now when you consider <code>io.BytesIO()</code> instead: <pre class="prettyprint"><code>with io.BytesIO() as f: f.write(b"Hello World") f.write(b"Hello World") f.write(b"Hello World") </code></pre> Which instead of writing the contents to a file, it's written to an in memory buffer. In other words a chunk of RAM. Essentially writing the following would be the equivalent: <pre class="prettyprint"><code>buffer = b"" buffer += b"Hello World" buffer += b"Hello World" buffer += b"Hello World" </code></pre> In relation to the example with the with statement, then at the end there would also be a <code>del buffer</code>. The key difference here is optimization and performance. <code>io.BytesIO</code> is able to do some optimizations that makes it faster than simply concatenating all the <code>b"Hello World"</code> one by one. Just to prove it here's a small benchmark: <ul> <li>Concat: 1.3529 seconds</li> <li>BytesIO: 0.0090 seconds</li> </ul> <pre class="prettyprint"><code>import io import time begin = time.time() buffer = b"" for i in range(0, 50000): buffer += b"Hello World" end = time.time() seconds = end - begin print("Concat:", seconds) begin = time.time() buffer = io.BytesIO() for i in range(0, 50000): buffer.write(b"Hello World") end = time.time() seconds = end - begin print("BytesIO:", seconds) </code></pre> Besides the performance gain, using <code>BytesIO</code> instead of concatenating has the advantage that <code>BytesIO</code> can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file. The difference is that <code>open("myfile.jpg", "rb")</code> simply loads and returns the contents of <code>myfile.jpg</code>; whereas, <code>BytesIO</code> again is just a buffer containing some data. Since <code>BytesIO</code> is just a buffer - if you wanted to write the contents to a file later - you'd have to do: <pre class="prettyprint"><code>buffer = io.BytesIO() # ... with open("test.dat", "wb") as f: f.write(buffer.getvalue()) </code></pre> Also, you didn't mention a version; I'm using Python 3. Related to the examples: I'm using the with statement instead of calling <code>f.close()</code>

Difference between `open` and `io.BytesIO` in binary streams

Tags:

python

io

stream

I'm learning about working with streams in Python and I noticed that the IO docs say the following:

The easiest way to create a binary stream is with open() with 'b' in the mode string:

f = open("myfile.jpg", "rb")

In-memory binary streams are also available as BytesIO objects:

f = io.BytesIO(b"some initial binary data: \x00\x01")

What is the difference between f as defined by open and f as defined by BytesIO. In other words, what makes a "In-memory binary stream" and how is that different from what open does?

938

asked Mar 15 '17 02:03

Luke Whyte

1 Answers

For simplicity's sake, let's consider writing instead of reading for now.

So when you use open() like say:

with open("test.dat", "wb") as f:     f.write(b"Hello World")     f.write(b"Hello World")     f.write(b"Hello World")

After executing that a file called test.dat will be created, containing 3x Hello World. The data wont be kept in memory after it's written to the file (unless being kept by a name).

Now when you consider io.BytesIO() instead:

with io.BytesIO() as f:     f.write(b"Hello World")     f.write(b"Hello World")     f.write(b"Hello World")

Which instead of writing the contents to a file, it's written to an in memory buffer. In other words a chunk of RAM. Essentially writing the following would be the equivalent:

buffer = b"" buffer += b"Hello World" buffer += b"Hello World" buffer += b"Hello World"

In relation to the example with the with statement, then at the end there would also be a del buffer.

The key difference here is optimization and performance. io.BytesIO is able to do some optimizations that makes it faster than simply concatenating all the b"Hello World" one by one.

Just to prove it here's a small benchmark:

Concat: 1.3529 seconds
BytesIO: 0.0090 seconds

import io import time  begin = time.time() buffer = b"" for i in range(0, 50000):     buffer += b"Hello World" end = time.time() seconds = end - begin print("Concat:", seconds)  begin = time.time() buffer = io.BytesIO() for i in range(0, 50000):     buffer.write(b"Hello World") end = time.time() seconds = end - begin print("BytesIO:", seconds)

Besides the performance gain, using BytesIO instead of concatenating has the advantage that BytesIO can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.

The difference is that open("myfile.jpg", "rb") simply loads and returns the contents of myfile.jpg; whereas, BytesIO again is just a buffer containing some data.

Since BytesIO is just a buffer - if you wanted to write the contents to a file later - you'd have to do:

buffer = io.BytesIO() # ... with open("test.dat", "wb") as f:     f.write(buffer.getvalue())

Also, you didn't mention a version; I'm using Python 3. Related to the examples: I'm using the with statement instead of calling f.close()

151

answered Oct 17 '22 06:10

vallentin

Related questions
                            
                                Python Timezone conversion
                            
                                Pycharm and sys.argv arguments
                            
                                What is the most pythonic way to check if multiple variables are not None?
                            
                                AssertionError: View function mapping is overwriting an existing endpoint function: main
                            
                                Positional argument v.s. keyword argument
                            
                                NaN loss when training regression network
                            
                                Pythonic way to combine two lists in an alternating fashion?
                            
                                Differences between STATICFILES_DIR, STATIC_ROOT and MEDIA_ROOT
                            
                                How to obfuscate Python code effectively?
                            
                                How to plot ROC curve in Python
                            
                                When should I use @classmethod and when def method(self)?
                            
                                How to expire session due to inactivity in Django?
                            
                                How do you determine which backend is being used by matplotlib?
                            
                                What is Python used for? [closed]
                            
                                "x not in y" or "not x in y"
                            
                                Heatmap in matplotlib with pcolor?
                            
                                What are the 'levels', 'keys', and names arguments for in Pandas' concat function?
                            
                                How to avoid the spell check on string in Pycharm
                            
                                Is it ok having both Anacondas 2.7 and 3.5 installed in the same time?
                            
                                Python Pip broken with sys.stderr.write(f"ERROR: {exc}")

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With