I've just started to learn Python but I already ran into troubles.
I have a simple script with just one command:
#!/usr/bin/env python3
print("Příliš žluťoučký kůň úpěl ďábelské ódy.") # Text in Czech
When I try to run this script:
python3 hello.py
I get this message:
Traceback (most recent call last):
File "hello.py", line 2, in <module>
print("P\u0159\xedli\u0161 \u017elu\u0165ou\u010dk\xfd k\u016fn \xfap\u011bl \u010f\xe1belsk\xe9 \xf3dy.")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
I am using Kubuntu 16.04 and Python 3.5.2.
When I tried this: export PYTHONIOENCODING=utf-8
It worked but only temporarily. Next time I opened bash I got the same error.
According to https://docs.python.org/3/howto/unicode.html#the-string-type
the default encoding for Python source code is UTF-8.
So I have the source file saved id UTF-8, Konsole is set to UTF-8 but I still get the error!
Even if I add
# -*- coding: utf-8 -*-
to the beginning it does nothing.
Another weird thing: when I run it using only python, not python3, it works. How is it possible to work in Python 2.7.12 and not in 3.5.2?
Any ideas for solving this permanently? Thank you.
Thanks to Mark Tolen and Alastair McCormack for suggesting where the problem may be. The problem was really in the locale settings.
When I ran locale
, the output was:
LANG=C
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC=cs_CZ.UTF-8
LC_TIME=cs_CZ.UTF-8
LC_COLLATE=cs_CZ.UTF-8
LC_MONETARY=cs_CZ.UTF-8
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT=cs_CZ.UTF-8
LC_IDENTIFICATION="C"
LC_ALL=
This "C" is the default setting which uses the ANSI charmap. And that is where the problem was. Running locale charmap
gave me: ANSI_X3.4-1968
which can not display non-English characters.
I fixed this using this Ubuntu documentation site.
I added these lines to /etc/default/locale
:
LANGUAGE=cs_CZ.UTF-8
LC_ALL=cs_CZ.UTF-8
Then you have to restart your session (log out and in) to apply these settings.
Running locale
now returns this output:
LANG=C
LANGUAGE=cs
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=cs_CZ.UTF-8
and running locale charmap
returns:
UTF-8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With