Version: Python 2.7.3
Other libraries: Python-Requests 1.2.3, jinja2 (2.6)
I have a script that submits data to a forum and the problem is that non-ascii characters appear as garbage. For instance a name like André Téchiné comes out as André Téchiné.
Here's how the data is submitted:
1) Data is initially loaded from a UTF-8 encoded CSV file like so:
entries = []
with codecs.open(filename, 'r', 'utf-8') as f:
for row in unicode_csv_reader(f.readlines()[1:]):
entries.append(dict(zip(csv_header, row)))
unicode_csv_reader is from the bottom of Python CSV documentation page: http://docs.python.org/2/library/csv.html
When I type the entries name in the interpreter, I see the name as u'Andr\xe9 T\xe9chin\xe9'
.
2) Next I render the data through jinja2:
tpl = tpl_env.get_template(u'forumpost.html')
rendered = tpl.render(entries=entries)
When I type the name rendered in the interpreter I see again the same: u'Andr\xe9 T\xe9chin\xe9'
Now, if I write the rendered variable to a filename like this, it displays correctly:
with codecs.open('out.txt', 'a', 'utf-8') as f:
f.write(rendered)
But I must send it to the forum:
3) In the POST request code I have:
params = {u'post': rendered}
headers = {u'content-type': u'application/x-www-form-urlencoded'}
session.post(posturl, data=params, headers=headers, cookies=session.cookies)
session is a Requests session.
And the name is displayed broken in the forum post. I have tried the following:
If I type rendered.encode('utf-8') I see the following:
'Andr\xc3\xa9 T\xc3\xa9chin\xc3\xa9'
How could I fix the issue? Thanks.
Your client behaves as it should e.g. running nc -l 8888
as a server and making a request:
import requests requests.post('http://localhost:8888', data={u'post': u'Andr\xe9 T\xe9chin\xe9'})
shows:
POST / HTTP/1.1 Host: localhost:8888 Content-Length: 33 Content-Type: application/x-www-form-urlencoded Accept-Encoding: gzip, deflate, compress Accept: */* User-Agent: python-requests/1.2.3 CPython/2.7.3 post=Andr%C3%A9+T%C3%A9chin%C3%A9
You can check that it is correct:
>>> import urllib >>> urllib.unquote_plus(b"Andr%C3%A9+T%C3%A9chin%C3%A9").decode('utf-8') u'Andr\xe9 T\xe9chin\xe9'
check the server decodes the request correctly. You could try to specify the charset:
headers = {"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"}
the body contains only ascii characters so it shouldn't hurt and the correct server would ignore any parameters for x-www-form-urlencoded
type anyway. Look for gory details in URL-encoded form data
check the issue is not a display artefact i.e., the value is correct but it displays incorrectly
Try to decode into utf8:
unicode(my_string_variable, "utf8")
or decode and encode:
sometext = gettextfromsomewhere().decode('utf-8')
env = jinja2.Environment(loader=jinja2.PackageLoader('jinjaapplication', 'templates'))
template = env.get_template('mypage.html')
print template.render( sometext = sometext ).encode('utf-8')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With