Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python3: UTF-8 encoding in http.server

I have encoding problems when serving a simple web page in python3, using BaseHTTPRequestHandler.

Here is a working example:

#!/usr/bin/python3
# -*- coding: utf-8 -*

from http.server import BaseHTTPRequestHandler, HTTPServer
from os import curdir, sep, remove
import cgi

HTML_FILE_NAME = 'test.html'
PORT_NUMBER = 8080

# This class will handles any incoming request from the browser
class myHandler(BaseHTTPRequestHandler):

    # Handler for the GET requests
    def do_GET(self):
        self.path = HTML_FILE_NAME
        try:
            with open(curdir + sep + self.path, 'r') as f:
                self.send_response(200)
                self.send_header('Content-type', 'text/html')
                self.end_headers()
                self.wfile.write(bytes(f.read(), 'UTF-8'))
            return
        except IOError:
            self.send_error(404, 'File Not Found: %s' % self.path)

try:
    # Create a web server and define the handler to manage the incoming request
    with open(HTML_FILE_NAME, 'w') as f:
        f.write('<!DOCTYPE html><html><body> <p> My name is Jérôme </p> </body></html>')
    print('Started httpserver on port %i.' % PORT_NUMBER)

    #Wait forever for incoming http requests
    HTTPServer(('', PORT_NUMBER), myHandler).serve_forever()

except KeyboardInterrupt:
    print('Interrupted by the user - shutting down the web server.')
    server.socket.close()
    remove(HTML_FILE_NAME)

The expected result is to serve a web page displaying My name is Jérôme.

Instead, I have: My name is Jérôme

As you can see, the html page is correctly encoded, with self.wfile.write(bytes(f.read(), 'UTF-8')), so I think the problem comes from the web server.

How to tell the web server to serve the page in UTF-8?

like image 361
roipoussiere Avatar asked Jun 03 '16 10:06

roipoussiere


People also ask

How do I fix UTF-8 encoding in Python?

Set the Python encoding to UTF-8. This will ensure the fix for the current session . $ export PYTHONIOENCODING=utf8. Set the environment variables in /etc/default/locale . This way the system`s default locale encoding is set to the UTF-8 format. LANG="UTF-8" or "en_US.UTF-8" LC_ALL="UTF-8" or "en_US.UTF-8" LC_CTYPE="UTF-8" or "en_US.UTF-8".

How to URL encode any string in Python 3+?

In Python 3+, You can URL encode any string using the quote () function provided by urllib.parse package. The quote () function by default uses UTF-8 encoding scheme. Note that, the quote () function considers / character safe by default. That means, It doesn’t encode / character -

What is the default character encoding in Python 3?

In Python 3 UTF-8 is the default source encoding When the encoding is not correctly set-up , it is commonly seen to throw an “”UnicodeDecodeError: ‘ascii’ codec can’t encode” error Python string function uses the default character encoding . Check sys.stdout

How does the encode () method work in Python?

The encode () method encodes the string, using the specified encoding. If no encoding is specified, UTF-8 will be used.


2 Answers

No problem if I add:

<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">

in my html head.

like image 50
roipoussiere Avatar answered Sep 19 '22 13:09

roipoussiere


Your web server is already sending the text encoded to UTF-8 but you need to tell your browser the encoding of the bytes it receives. The HTTP spec. declares ISO-8995-1 as the default.

The HTTP standard way of doing is this is to tag the Content-type header value with a charset sub-key.

Therefore, you should change your code to read:

self.send_header('Content-type', 'text/html; charset=utf-8')

Also, watch out for the encoding of your HTML file. Without an encoding given to open(), it'll be guessed based on your locale. This won't break anything, unless you end up running this script where the locale is C, POSIX or non-latin Windows.

like image 30
Alastair McCormack Avatar answered Sep 19 '22 13:09

Alastair McCormack