How can I wrap an open binary stream – a Python 2 file
, a Python 3 io.BufferedReader
, an io.BytesIO
– in an io.TextIOWrapper
?
I'm trying to write code that will work unchanged:
io.TextIOWrapper
that wraps the specified stream.The io.TextIOWrapper
is needed because its API is expected by other parts of the standard library. Other file-like types exist, but don't provide the right API.
Wrapping the binary stream presented as the subprocess.Popen.stdout
attribute:
import subprocess import io gnupg_subprocess = subprocess.Popen( ["gpg", "--version"], stdout=subprocess.PIPE) gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
In unit tests, the stream is replaced with an io.BytesIO
instance to control its content without touching any subprocesses or filesystems.
gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
That works fine on the streams created by Python 3's standard library. The same code, though, fails on streams generated by Python 2:
[Python 2] >>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'file' object has no attribute 'readable'
file
An obvious response is to have a branch in the code which tests whether the stream actually is a Python 2 file
object, and handle that differently from io.*
objects.
That's not an option for well-tested code, because it makes a branch that unit tests – which, in order to run as fast as possible, must not create any real filesystem objects – can't exercise.
The unit tests will be providing test doubles, not real file
objects. So creating a branch which won't be exercised by those test doubles is defeating the test suite.
io.open
Some respondents suggest re-opening (e.g. with io.open
) the underlying file handle:
gnupg_stdout = io.open( gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
That works on both Python 3 and Python 2:
[Python 3] >>> type(gnupg_subprocess.stdout) <class '_io.BufferedReader'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") >>> type(gnupg_stdout) <class '_io.TextIOWrapper'>
[Python 2] >>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") >>> type(gnupg_stdout) <type '_io.TextIOWrapper'>
But of course it relies on re-opening a real file from its file handle. So it fails in unit tests when the test double is an io.BytesIO
instance:
>>> gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8")) >>> type(gnupg_subprocess.stdout) <type '_io.BytesIO'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> io.UnsupportedOperation: fileno
codecs.getreader
The standard library also has the codecs
module, which provides wrapper features:
import codecs gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
That's good because it doesn't attempt to re-open the stream. But it fails to provide the io.TextIOWrapper
API. Specifically, it doesn't inherit io.IOBase
and doesn't have the encoding
attribute:
>>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout) >>> type(gnupg_stdout) <type 'instance'> >>> isinstance(gnupg_stdout, io.IOBase) False >>> gnupg_stdout.encoding Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/codecs.py", line 643, in __getattr__ return getattr(self.stream, name) AttributeError: '_io.BytesIO' object has no attribute 'encoding'
So codecs
doesn't provide objects which substitute for io.TextIOWrapper
.
So how can I write code that works for both Python 2 and Python 3, with both the test doubles and the real objects, which wraps an io.TextIOWrapper
around the already-open byte stream?
StringIO and BytesIO are methods that manipulate string and bytes data in memory. StringIO is used for string data and BytesIO is used for binary data. This classes create file like object that operate on string data. The StringIO and BytesIO classes are most useful in scenarios where you need to mimic a normal file.
TextIOWrapper class The file object returned by open() function is an object of type _io. TextIOWrapper . The class _io. TextIOWrapper provides methods and attributes which helps us to read or write data to and from the file.
The io module provides Python's main facilities for dealing with various types of I/O. There are three main types of I/O: text I/O, binary I/O and raw I/O. These are generic categories, and various backing stores can be used for each of them.
Returns: io. BufferedRandom: A file-object which can be read/written too. ''' path = genpath(*paths) gendir(os. path. dirname(path)) if not os.
Use codecs.getreader to produce a wrapper object:
text_stream = codecs.getreader("utf-8")(bytes_stream)
Works on Python 2 and Python 3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With