Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrap an open stream with io.TextIOWrapper

How can I wrap an open binary stream – a Python 2 file, a Python 3 io.BufferedReader, an io.BytesIO – in an io.TextIOWrapper?

I'm trying to write code that will work unchanged:

  • Running on Python 2.
  • Running on Python 3.
  • With binary streams generated from the standard library (i.e. I can't control what type they are)
  • With binary streams made to be test doubles (i.e. no file handle, can't re-open).
  • Producing an io.TextIOWrapper that wraps the specified stream.

The io.TextIOWrapper is needed because its API is expected by other parts of the standard library. Other file-like types exist, but don't provide the right API.

Example

Wrapping the binary stream presented as the subprocess.Popen.stdout attribute:

import subprocess import io  gnupg_subprocess = subprocess.Popen(         ["gpg", "--version"], stdout=subprocess.PIPE) gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8") 

In unit tests, the stream is replaced with an io.BytesIO instance to control its content without touching any subprocesses or filesystems.

gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8")) 

That works fine on the streams created by Python 3's standard library. The same code, though, fails on streams generated by Python 2:

[Python 2] >>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8") Traceback (most recent call last):   File "<stdin>", line 1, in <module> AttributeError: 'file' object has no attribute 'readable' 

Not a solution: Special treatment for file

An obvious response is to have a branch in the code which tests whether the stream actually is a Python 2 file object, and handle that differently from io.* objects.

That's not an option for well-tested code, because it makes a branch that unit tests – which, in order to run as fast as possible, must not create any real filesystem objects – can't exercise.

The unit tests will be providing test doubles, not real file objects. So creating a branch which won't be exercised by those test doubles is defeating the test suite.

Not a solution: io.open

Some respondents suggest re-opening (e.g. with io.open) the underlying file handle:

gnupg_stdout = io.open(         gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") 

That works on both Python 3 and Python 2:

[Python 3] >>> type(gnupg_subprocess.stdout) <class '_io.BufferedReader'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") >>> type(gnupg_stdout) <class '_io.TextIOWrapper'> 
[Python 2] >>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") >>> type(gnupg_stdout) <type '_io.TextIOWrapper'> 

But of course it relies on re-opening a real file from its file handle. So it fails in unit tests when the test double is an io.BytesIO instance:

>>> gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8")) >>> type(gnupg_subprocess.stdout) <type '_io.BytesIO'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") Traceback (most recent call last):   File "<stdin>", line 1, in <module> io.UnsupportedOperation: fileno 

Not a solution: codecs.getreader

The standard library also has the codecs module, which provides wrapper features:

import codecs  gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout) 

That's good because it doesn't attempt to re-open the stream. But it fails to provide the io.TextIOWrapper API. Specifically, it doesn't inherit io.IOBase and doesn't have the encoding attribute:

>>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout) >>> type(gnupg_stdout) <type 'instance'> >>> isinstance(gnupg_stdout, io.IOBase) False >>> gnupg_stdout.encoding Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/usr/lib/python2.7/codecs.py", line 643, in __getattr__     return getattr(self.stream, name) AttributeError: '_io.BytesIO' object has no attribute 'encoding' 

So codecs doesn't provide objects which substitute for io.TextIOWrapper.

What to do?

So how can I write code that works for both Python 2 and Python 3, with both the test doubles and the real objects, which wraps an io.TextIOWrapper around the already-open byte stream?

like image 715
bignose Avatar asked Dec 24 '15 05:12

bignose


People also ask

What is io BytesIO ()?

StringIO and BytesIO are methods that manipulate string and bytes data in memory. StringIO is used for string data and BytesIO is used for binary data. This classes create file like object that operate on string data. The StringIO and BytesIO classes are most useful in scenarios where you need to mimic a normal file.

What is '_ Io TextIOWrapper?

TextIOWrapper class The file object returned by open() function is an object of type _io. TextIOWrapper . The class _io. TextIOWrapper provides methods and attributes which helps us to read or write data to and from the file.

What is io stream in Python?

The io module provides Python's main facilities for dealing with various types of I/O. There are three main types of I/O: text I/O, binary I/O and raw I/O. These are generic categories, and various backing stores can be used for each of them.

What is _IO BufferedRandom?

Returns: io. BufferedRandom: A file-object which can be read/written too. ''' path = genpath(*paths) gendir(os. path. dirname(path)) if not os.


1 Answers

Use codecs.getreader to produce a wrapper object:

text_stream = codecs.getreader("utf-8")(bytes_stream) 

Works on Python 2 and Python 3.

like image 173
jbg Avatar answered Sep 28 '22 00:09

jbg