Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mypy Python 2 insist on unicode value not string value

Python 2 will implicitly convert str to unicode in some circumstances. This conversion will sometimes throw a UnicodeError depending on what you try to do with the resulting value. I don't know the exact semantics, but it's something I'd like to avoid.

Is it possible to use another type besides unicode or a command-line argument similar to --strict-optional (http://mypy-lang.blogspot.co.uk/2016/07/mypy-043-released.html) to cause programs using this implicit conversion to fail to type check?

def returns_string_not_unicode():
    # type: () -> str
    return u"a"

def returns_unicode_not_string():
    # type: () -> unicode
    return "a"

In this example, only the function returns_string_not_unicode fails to type check.

$ mypy --py2 unicode.py
unicode.py: note: In function "returns_string_not_unicode":
unicode.py:3: error: Incompatible return value type (got "unicode", expected "str")

I would like both of them to fail to typecheck.

EDIT:

type: () -> byte seems to be treated the same way as str

def returns_string_not_unicode():
    # type: () -> bytes
    return u"a"
like image 581
Gregory Nisbet Avatar asked Oct 28 '25 10:10

Gregory Nisbet


1 Answers

This is, unfortunately, an ongoing and currently unresolved issue -- see https://github.com/python/mypy/issues/1141 and https://github.com/python/typing/issues/208.

A partial fix is to use typing.Text which is (unfortunately) currently undocumented (I'll work on fixing that though). It's aliased to str in Python 3 and to unicode in Python 2. It won't resolve your actual issue or cause the second function to fail to typecheck, but it does make it a bit easier to write types compatible with both Python 2 and Python 3.

In the meantime, you can hack together a partial workaround by using the recently-implemented NewType feature -- it lets you define a psuedo-subclass with minimal runtime cost, which you can use to approximate the functionality you're looking for:

from typing import NewType, Text

# Tell mypy to treat 'Unicode' as a subtype of `Text`, which is
# aliased to 'unicode' in Python 2 and 'str' (aka unicode) in Python 3
Unicode = NewType('Unicode', Text)

def unicode_not_str(a: Unicode) -> Unicode:
    return a

# my_unicode is still the original string at runtime, but Mypy
# treats it as having a distinct type from `str` and `unicode`.
my_unicode = Unicode(u"some string")

unicode_not_str(my_unicode)      # typechecks
unicode_not_str("foo")           # fails
unicode_not_str(u"foo")          # fails, unfortunately
unicode_not_str(Unicode("bar"))  # works, unfortunately

It's not perfect, but if you're principled about when you elevate a string into being treated as being of your custom Unicode type, you can get something approximating the type safety you're looking for with minimal runtime cost until the bytes/str/unicode issue is settled.

Note that you'll need to install mypy from the master branch on Github to use NewType.

Note that NewType was added as of mypy version 0.4.4.

like image 172
Michael0x2a Avatar answered Oct 31 '25 01:10

Michael0x2a