I have encoding problems when serving a simple web page in python3, using BaseHTTPRequestHandler.
Here is a working example:
#!/usr/bin/python3
# -*- coding: utf-8 -*
from http.server import BaseHTTPRequestHandler, HTTPServer
from os import curdir, sep, remove
import cgi
HTML_FILE_NAME = 'test.html'
PORT_NUMBER = 8080
# This class will handles any incoming request from the browser
class myHandler(BaseHTTPRequestHandler):
# Handler for the GET requests
def do_GET(self):
self.path = HTML_FILE_NAME
try:
with open(curdir + sep + self.path, 'r') as f:
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers()
self.wfile.write(bytes(f.read(), 'UTF-8'))
return
except IOError:
self.send_error(404, 'File Not Found: %s' % self.path)
try:
# Create a web server and define the handler to manage the incoming request
with open(HTML_FILE_NAME, 'w') as f:
f.write('<!DOCTYPE html><html><body> <p> My name is Jérôme </p> </body></html>')
print('Started httpserver on port %i.' % PORT_NUMBER)
#Wait forever for incoming http requests
HTTPServer(('', PORT_NUMBER), myHandler).serve_forever()
except KeyboardInterrupt:
print('Interrupted by the user - shutting down the web server.')
server.socket.close()
remove(HTML_FILE_NAME)
The expected result is to serve a web page displaying My name is Jérôme.
Instead, I have: My name is Jérôme
As you can see, the html page is correctly encoded, with self.wfile.write(bytes(f.read(), 'UTF-8')), so I think the problem comes from the web server.
How to tell the web server to serve the page in UTF-8?
Set the Python encoding to UTF-8. This will ensure the fix for the current session . $ export PYTHONIOENCODING=utf8. Set the environment variables in /etc/default/locale . This way the system`s default locale encoding is set to the UTF-8 format. LANG="UTF-8" or "en_US.UTF-8" LC_ALL="UTF-8" or "en_US.UTF-8" LC_CTYPE="UTF-8" or "en_US.UTF-8".
In Python 3+, You can URL encode any string using the quote () function provided by urllib.parse package. The quote () function by default uses UTF-8 encoding scheme. Note that, the quote () function considers / character safe by default. That means, It doesn’t encode / character -
In Python 3 UTF-8 is the default source encoding When the encoding is not correctly set-up , it is commonly seen to throw an “”UnicodeDecodeError: ‘ascii’ codec can’t encode” error Python string function uses the default character encoding . Check sys.stdout
The encode () method encodes the string, using the specified encoding. If no encoding is specified, UTF-8 will be used.
No problem if I add:
<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">
in my html head.
Your web server is already sending the text encoded to UTF-8 but you need to tell your browser the encoding of the bytes it receives. The HTTP spec. declares ISO-8995-1 as the default.
The HTTP standard way of doing is this is to tag the Content-type header value with a charset sub-key.
Therefore, you should change your code to read:
self.send_header('Content-type', 'text/html; charset=utf-8')
Also, watch out for the encoding of your HTML file. Without an encoding given to open(), it'll be guessed based on your locale. This won't break anything, unless you end up running this script where the locale is C, POSIX or non-latin Windows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With