When I run into unicode printing problems, I want to know what I should check. In my particular case, I'm using an installed module that is printing unicode encoded characters using the wrong codec.
There are several disparate places that affect python encoding and decoding under a variety of circumstances. And specifically how python handles printable data in different circumstances.
Some things off the top of mind:
LC_ALL
, LANG
sys
module setting sys.getdefaultencoding()
What else am I forgetting?
I'm only interested in python 3.
Here is what I found, in order of how I recommend checking them:
LC_ALL
, LANG
, LC_CTYPE
, LANGUAGE
PYTHONIOENCODING
, PYTHONCOERCECLOCALE
-E
; check sys.flags.ignore_environment
)PYTHONLEGACYWINDOWSSTDIO
sys
module
sys.getdefaultencoding()
sys.setdefaultencoding
was removed from Python 3)sys.stdin.encoding
sys.stdout.encoding
sys.stderr.encoding
sys.getfilesystemencoding()
coding:
, as in
# -*- coding: utf-8 -*-
effects parser interpretation of built-in strings.locale
module
locale.nl_langinfo(locale.CODESET)
locale.getdefaultlocale
locale.getpreferredencoding
gettext
module and it's various facilities (too many to list all of them)gettext.install(application, directory)
or gettext.bindtextdomain(domain, directory)
Here is a short script to list the values of most of these:
#!/usr/bin/env python3
#
# print various locale information
import locale
import os
import sys
def main():
print("Python:")
print(" version:", sys.version.replace("\n", " "))
print("environment:")
for env in (
"LC_ALL",
"LANG",
"LC_CTYPE",
"LANGUAGE",
"PYTHONUTF8",
"PYTHONIOENCODING",
"PYTHONLEGACYWINDOWSSTDIO",
"PYTHONCOERCECLOCALE",
):
if env in os.environ:
print(" \"%s\"=\"%s\"" % (env, os.environ[env]))
else:
print(" \"%s\" not set" % env)
print(" -E (ignore PYTHON* environment variables) ?", bool(sys.flags.ignore_environment))
print()
print("sys module:")
print(" sys.getdefaultencoding() \"%s\"" % sys.getdefaultencoding())
print(" sys.stdin.encoding \"%s\"" % sys.stdin.encoding)
print(" sys.stdout.encoding \"%s\"" % sys.stdout.encoding)
print(" sys.stderr.encoding \"%s\"" % sys.stderr.encoding)
print(" sys.getfilesystemencoding() \"%s\"" % sys.getfilesystemencoding())
print()
print("locale module:")
if hasattr(locale, "nl_langinfo"):
print(" locale.nl_langinfo(locale.CODESET) \"%s\""
% locale.nl_langinfo(locale.CODESET))
else:
print(" locale.nl_langinfo not available")
try:
print(" locale.getencoding() \"%s\"" % locale.getencoding())
except AttributeError:
print(" locale.getencoding() not available")
try:
print(" locale.getlocale()", (locale.getlocale(),))
except AttributeError:
print(" locale.getlocale() not available")
try:
print(" locale.getpreferredencoding() \"%s\""
% locale.getpreferredencoding())
except AttributeError:
print(" locale.getpreferredencoding() not available")
try:
print(" locale.getdefaultlocale()[1] \"%s\""
% locale.getdefaultlocale()[1])
except AttributeError:
print(" locale.getdefaultlocale() not available")
if __name__ == "__main__":
main()
On Windows 10 using Python 3.7 within built-in PowerShell terminal, this prints
PS> python.exe print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG" not set
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING"="UTF-8"
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo not available
locale.getdefaultlocale()[1] "cp1252"
locale.ngetpreferredencoding() "cp1252"
On Debian 9 using Python 3.5, this prints
$ python print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG"="en_GB.UTF-8"
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING" not set
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo(locale.CODESET) "UTF-8"
locale.getdefaultlocale()[1] "UTF-8"
locale.ngetpreferredencoding() "UTF-8"
On Ubuntu 14.04 using Python 3.4, this prints
$ python print-locale.py
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG"="en_US.UTF-8"
"LC_CTYPE" not set
"LANGUAGE"="en_US:"
"PYTHONIOENCODING" not set
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo(locale.CODESET) "UTF-8"
locale.getdefaultlocale()[1] "UTF-8"
locale.getpreferredencoding() "UTF-8"
Unfortunately, when I run into unicode print problems with installed modules, it is not immediately obvious which setting is affecting that module. Doubly so, understanding how these different possible parameters and settings interact is all the more confounding. There are many combinations of settings to test.
But this little bit might help someone get started.
Also see helpful answers at SO Question How to set sys.stdout encoding in Python 3?.
python -X UTF8 ...
)PYTHONLEGACYWINDOWSFSENCODING
)# -*- coding: ... -*-
)Some help from this pymotw article, python how-to unicode, python sys module, python locale module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With