Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python3 utf-8 decode issue

The following code runs fine with Python3 on my Windows machine and prints the character 'é':

data = b"\xc3\xa9"

print(data.decode('utf-8'))

However, running the same on an Ubuntu based docker container results in :

UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128)

Is there anything that I have to install to enable utf-8 decoding ?

like image 886
user3923073 Avatar asked Jan 04 '23 00:01

user3923073


1 Answers

Seems ubuntu - depending on version - uses one encoding or another as default, and it may vary between shell and python as well. Adopted from this posting and also this blog:

Thus the recommended way seems to be to tell your python instance to use utf-8 as default encoding:

Set your default encoding of python source files via environment variable:

export PYTHONIOENCODING=utf8

Also, in your source files you can state the encoding you prefer to be used explicitly, so it should work irrespective of environment setting (see this question + answer, python docs and PEP 263:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
....

Concerning the interpretation of encoding of files read by python, you can specify it explicitly in the open command

with open(fname, "rt", encoding="utf-8") as f:
    ...

and there's a more hackish way with some side effects, but saves you to explicitly specify it each time

import sys
# sys.setdefaultencoding() does not exist, here!
reload(sys)  # Reload does the trick!
sys.setdefaultencoding('UTF8')

Please read the warnings about this hack in the related answer and comments.

like image 161
planetmaker Avatar answered Jan 05 '23 15:01

planetmaker