Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python3 default encoding UnicodeDecodeError ascii using apache WSGI

import locale
prefered_encoding = locale.getpreferredencoding()
prefered_encoding 'ANSI_X3.4-1968'

I'm using a framework called inginious and it's using web.py to render its template.

web.template.render(os.path.join(root_path, dir_path),
                                   globals=self._template_globals,
                                   base=layout_path)

The rendering works on my localhost but not on my staging server.

They both run python3. I see that web.py enforces utf-8 on

the encoding in Python2 only (that's out of my hands)

def __str__(self):
    self._prepare_body()
    if PY2:
        return self["__body__"].encode('utf-8')
    else:
        return self["__body__"]

here is the stack trace

t = self._template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1028, in _template,
self._cache[name] = self._load_template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1016, in _load_template
return Template(open(path).read(), filename=path, **self._keywords)
File "/lib64/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 83: ordinal not in range(128),

My html do include hebew chars, small example

<div class="modal-content">
                    <div class="modal-header">
                        <button type="button" class="close" data-dismiss="modal">&times;</button>
                        <h4 class="modal-title feedback-modal-title">
                            חישוב האיברים הראשונים בסדרה של איבר ראשון חיובי ויחס שלילי:
                            <span class="red-text">אי הצלחה</span>

and I open it like so :

open('/path/to/feedback.html').read()

and the line where the encoding fails is where the Hebrew chars are.

I tried setting some environment variables in ~/.bashrc:

export PYTHONIOENCODING=utf8
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

under the user centos

The ingenious framework is installed as a pip under python3.5 site-packages. and it served by an apache server under the user apache

Tried setting the environment variables in the code (during the init of the app) so that the apache WSGI will be aware of them

import os 
os.environ['LC_ALL'] = 'en_US.UTF-8'
os.environ['LANG'] = 'en_US.UTF-8'
os.environ['LANGUAGE'] = 'en_US.UTF-8'

I have edited the /etc/httpd/conf/httpd.conf using the setenv method:

SetEnv LC_ALL en_US.UTF-8
SetEnv LANG en_US.UTF-8
SetEnv LANGUAGE en_US.UTF-8
SetEnv PYTHONIOENCODING utf8

and restarted using sudo service httpd restart and still no luck.

My question is, what is the best practice to solve this. I understand there are hacks for this, but I want to understand what is the underline cause as well as how to solve it.

Thanks!

like image 983
WebQube Avatar asked Oct 12 '17 06:10

WebQube


People also ask

How to solve the Python “unicodedecodeerror “ASCII” codec can’t decode byte in position”?

The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8. Here is an example of how the error occurs.

What is the default Unicode encoding in Python?

Default encoding Python uses "UTF-8" as the default Unicode encoding. You can read the default charset using sys.getdefaultencoding (). The "default encoding" is used by PyUnicode _ FromStringAndSize ().

How to fix unicodeencodeerror 'ASCII not in range(128)?

Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous). UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range (128)

How to handle ASCII characters in Python string function?

The Python string function handles the below set of ASCII characters comfortably – Set the Python encoding to UTF-8. This will ensure the fix for the current session . Set the environment variables correctly in /etc/default/locale .


1 Answers

finally found the answer when reading the file changed from

open('/path/to/feedback.html').read()

to

import codecs
with codecs.open(file_path,'r',encoding='utf8') as f:
     text = f.read()

if anyone has a more general approach that will work, I'll accept his answer

like image 117
WebQube Avatar answered Sep 28 '22 03:09

WebQube