Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cgi python3 problems with encoding

I created a cgi script (running at localhost with apache) which will load text from textarea and then I will work with it. I have problems with characters like š,ť,é,.. that they are not displayed correctly. I tried it in many ways. Here is one version of my shortcode in which I am just searching for the right way to deal with it.

#!C:/Python33/python 
# -*- coding: UTF-8 -*-
 
import cgi
import cgitb

cgitb.enable()

form = cgi.FieldStorage()
if form.getvalue('textcontent'):
   text_content = form.getvalue('textcontent')
else:
   text_content = ""


print ("Content-type:text/html")
print ()
print("<!DOCTYPE html>")
print ("<html>")
print ("<head>")
print("<meta charset='UTF-8'></meta>")
print ("</head>")
print ("<body>")
print ("<form>")
print ("text_area:<br />")
print ("<textarea name='textcontent' rows='5' cols='20'></textarea>")
print ("<br />")
print ("<input type='submit' value='submit form' />")
print ("</form>")
print("<p>")
print(text_content) 
print("</p>")
print ("</body>")
print ("</html>")

This way is using UTF-8, when I try to write something, it looks like this (write to textarea and submit):

čítam -> ��tam

When I use latin-1 as python encoding and utf-8 as charset in html part it works like this:

časa -> časa (correctly)

but with characters with an accent mark (for example áno) it returns error:

UnicodeEncodeError: 'charmap' codec can't encode character '\\ufffd' in position 0: character maps to <undefined>\r

With sys.stdout.encoding it writes cp1250 encoding (work under windows) and with sys.getdefaultencoding() it returns utf-8

I tried also text_content = (form.getvalue('textcontent')).encode('utf-8') for example word číslo and result is b'\xef\xbf\xbd\xef\xbf\xbdslo'

I don't know how to handle this problem.

I need číslo -> číslo fo example.

UPDATE: Now I have UTF-8 for python as html encoding. It looks like work with text (comparing words with the dictionary,..) is going well, so the only one problem now is that output looks like ��tam, so I need to modify it to look like čítam instead of ��tam.

UPDATE 2: When encoding is UTF-8, and in browser UTF-8 too, it displays �s, when I change browser encoding to cp1250, it displays correctly, but when I refresh the site or click on Submit button it writes error UnicodeEncodeError: 'charmap' codec can't encode character '\\ufffd'

UPDATE 3: Tried it on linux and after a few problems I found out that apache server is using wrong encoding(ascii), but I can't accomplish this problem yet. Modified /etc/apache2/envvars to PATH LANG="sk_SK.UTF-8" but got some warning in the terminal by gedit that editing was not good. So encoding is still ascii.

like image 924
TheBP Avatar asked Nov 10 '22 08:11

TheBP


1 Answers

write your form in this way:

<form accept-charset="utf-8">

put accept-charset = "utf-8" in your forms, it can solve this problems

like image 144
Kedes Dias Torres Avatar answered Dec 05 '22 07:12

Kedes Dias Torres