I created a cgi script (running at localhost with apache) which will load text from textarea and then I will work with it. I have problems with characters like š,ť,é,.. that they are not displayed correctly. I tried it in many ways. Here is one version of my shortcode in which I am just searching for the right way to deal with it.
#!C:/Python33/python
# -*- coding: UTF-8 -*-
import cgi
import cgitb
cgitb.enable()
form = cgi.FieldStorage()
if form.getvalue('textcontent'):
text_content = form.getvalue('textcontent')
else:
text_content = ""
print ("Content-type:text/html")
print ()
print("<!DOCTYPE html>")
print ("<html>")
print ("<head>")
print("<meta charset='UTF-8'></meta>")
print ("</head>")
print ("<body>")
print ("<form>")
print ("text_area:<br />")
print ("<textarea name='textcontent' rows='5' cols='20'></textarea>")
print ("<br />")
print ("<input type='submit' value='submit form' />")
print ("</form>")
print("<p>")
print(text_content)
print("</p>")
print ("</body>")
print ("</html>")
This way is using UTF-8, when I try to write something, it looks like this (write to textarea and submit):
čítam -> ��tam
When I use latin-1 as python encoding and utf-8 as charset in html part it works like this:
časa -> časa (correctly)
but with characters with an accent mark (for example áno) it returns error:
UnicodeEncodeError: 'charmap' codec can't encode character '\\ufffd' in position 0: character maps to <undefined>\r
With sys.stdout.encoding
it writes cp1250
encoding (work under windows) and with sys.getdefaultencoding()
it returns utf-8
I tried also text_content = (form.getvalue('textcontent')).encode('utf-8')
for example word číslo
and result is b'\xef\xbf\xbd\xef\xbf\xbdslo'
I don't know how to handle this problem.
I need číslo -> číslo
fo example.
UPDATE: Now I have UTF-8 for python as html encoding. It looks like work with text (comparing words with the dictionary,..) is going well, so the only one problem now is that output looks like ��tam, so I need to modify it to look like čítam instead of ��tam.
UPDATE 2: When encoding is UTF-8, and in browser UTF-8 too, it displays �s, when I change browser encoding to cp1250, it displays correctly, but when I refresh the site or click on Submit button it writes error UnicodeEncodeError: 'charmap' codec can't encode character '\\ufffd'
UPDATE 3: Tried it on linux and after a few problems I found out that apache server is using wrong encoding(ascii), but I can't accomplish this problem yet. Modified /etc/apache2/envvars
to PATH LANG="sk_SK.UTF-8" but got some warning in the terminal by gedit that editing was not good. So encoding is still ascii.
write your form in this way:
<form accept-charset="utf-8">
put accept-charset = "utf-8"
in your forms, it can solve this problems
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With