Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does mypy complain about TextIOWrapper receiving GzipFile as argument 1?

I am writing stuff to an in-memory binary stream in order to upload stuff to S3 without storing it on a local file (I have more memory than disk space). The following code is working, but mypy mvce.py fails with

mvce.py:6: error: Argument 1 to "TextIOWrapper" has incompatible type "GzipFile";
expected "IO[bytes]"
Found 1 error in 1 file (checked 1 source file)

mvce.py

from io import BytesIO, TextIOWrapper
import gzip

inmem = BytesIO()
with gzip.GzipFile(fileobj=inmem, mode="wb") as gzip_handler, TextIOWrapper(
    gzip_handler, encoding="utf-8"
) as wrapper:
    wrapper.write("some test string")


# Check if this actually worked
with open("foobar.gzip", "wb") as f1:
    inmem.seek(0)
    f1.write(inmem.read())


with gzip.open("foobar.gzip", "rb") as f2:
    data = f2.read()

print(data)

Question

Why does mypy fail and how do I make it work? Are there hidden potential issues?

like image 682
Martin Thoma Avatar asked Oct 15 '19 12:10

Martin Thoma


1 Answers

Why does MyPy fail?

MyPy uses a set of type stubs called Typeshed to define types for the standard library. In Typeshed, gzip.GzipFile does not inherit from typing.IO[bytes].

The class hierarchy is: gzip.GzipFile -> _compression.BaseStream -> io.BufferedIOBase -> io.IOBase.

How do I make it work?

You can use typing.cast(IO[bytes], gzip_handler) to hint to MyPy that the GzipFile instance should be considered a binary file object. See the documentation for more info about casts.

Alternatively, you could use gzip.open(inmem, mode='wt', encoding="utf-8") to get a text file object directly (essentially the same as what you're doing, see below). This function has return type IO[Any] in Typeshed.

Are there hidden potential issues?

The gzip documentation says this about the gzip.open() function:

For text mode, a GzipFile object is created, and wrapped in an io.TextIOWrapper instance with the specified encoding, error handling behavior, and line ending(s).

So your code should work fine in practice.

Can this be fixed in Typeshed?

I tried adding IO[bytes] as a superclass of GZipFile in Typeshed, and I got one error in the tests:

stdlib/3/gzip.pyi:17: error: Definition of "__enter__" in base class "IOBase" is incompatible with definition in base class "IO"

The solution to this issue is left as an exercise for the reader.

like image 112
augurar Avatar answered Nov 02 '22 03:11

augurar