For HTML5 and Python CGI:
If I write UTF-8 Meta Tag, my code doesn't work. If I don't write, it works.
Page encoding is UTF-8.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
This codes doesn't work.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
But this codes works.
Use open() to open a file with UTF-8 encoding Call open(file, encoding=None) with encoding as "UTF-8" to open file with UTF-8 encoding.
In Java, the InputStreamReader accepts a charset to decode the byte streams into character streams. We can pass a StandardCharsets. UTF_8 into the InputStreamReader constructor to read data from a UTF-8 file.
For CGI, using print()
requires that the correct codec has been set up for output. print()
writes to sys.stdout
and sys.stdout
has been opened with a specific encoding and how that is determined is platform dependent and can differ based on how the script is run. Running your script as a CGI script means you pretty much do not know what encoding will be used.
In your case, the web server has set the locale for text output to a fixed encoding other than UTF-8. Python uses that locale setting to produce output in in that encoding, and without the <meta>
header your browser correctly guesses that encoding (or the server has communicated it in the Content-Type header), but with the <meta>
header you are telling it to use a different encoding, one that is incorrect for the data produced.
You can write directly to sys.stdout.buffer
, after explicitly encoding to UTF-8. Make a helper function to make this easier:
import sys
def enc_print(string='', encoding='utf8'):
sys.stdout.buffer.write(string.encode(encoding) + b'\n')
enc_print("Content-type:text/html")
enc_print()
enc_print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
Another approach is to replace sys.stdout
with a new io.TextIOWrapper()
object that uses the codec you need:
import sys
import io
def set_output_encoding(codec, errors='strict'):
sys.stdout = io.TextIOWrapper(
sys.stdout.detach(), errors=errors,
line_buffering=sys.stdout.line_buffering)
set_output_encoding('utf8')
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
From https://ru.stackoverflow.com/a/352838/11350
First dont forget to set encoding in file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Then try
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
Or if you use apache2, add to your conf.
AddDefaultCharset UTF-8
SetEnv PYTHONIOENCODING utf8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With