I try to post unicode data with the httplib.request
function:
s = u"עברית"
data = """
<spellrequest textalreadyclipped="0" ignoredups="1" ignoredigits="1" ignoreallcaps="0">
<text>%s</text>
</spellrequest>
""" % s
con = httplib.HTTPSConnection("www.google.com")
con.request("POST", "/tbproxy/spell?lang=he", data)
response = con.getresponse().read()
However this is my error:
Traceback (most recent call last):
File "C:\Scripts\iQuality\test.py", line 47, in <module>
print spellFix(u"╫á╫נ╫¿╫ץ╫ר╫ץ")
File "C:\Scripts\iQuality\test.py", line 26, in spellFix
con.request("POST", "/tbproxy/spell?lang=%s" % lang, data)
File "C:\Python27\lib\httplib.py", line 955, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 989, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 951, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 815, in _send_output
self.send(message_body)
File "C:\Python27\lib\httplib.py", line 787, in send
self.sock.sendall(data)
File "C:\Python27\lib\ssl.py", line 220, in sendall
v = self.send(data[count:])
File "C:\Python27\lib\ssl.py", line 189, in send
v = self._sslobj.write(data)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 97-102: or
dinal not in range(128)
Where am I wrong?
To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2. x, you also need to prefix the string literal with 'u'.
To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 through 0x10FFFF (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of code units, and code units are then mapped to 8-bit bytes.
http is not defined in terms of a particular character encoding, and instead uses octets. You need to convert your data to an encoding, and then you need to tell the server which encoding you have used. Lets use utf8, since it's usually the best choice:
This data looks a bit like XML, but you are skipping the xml tag. Some services may accept that, but you shouldn't anyways. In fact, the encoding actually belongs there; so make sure you include it. The heading looks like <?xml version="1.0" encoding="
encoding"?>
.
s = u"עברית"
data_unicode = u"""<?xml version="1.0" encoding="UTF-8"?>
<spellrequest textalreadyclipped="0" ignoredups="1" ignoredigits="1" ignoreallcaps="0">
<text>%s</text>
</spellrequest>
""" % s
data_octets = data_unicode.encode('utf-8')
As a matter of courtesy, you should also tell the server itself the format and encoding, with the content-type
header:
con = httplib.HTTPSConnection("www.google.com")
con.request("POST",
"/tbproxy/spell?lang=he",
data_octets, {'content-type': 'text/xml; charset=utf-8'})
EDIT: It's working fine on my machine, are you sure you're not skipping something? full example
>>> from cgi import escape
>>> from urllib import urlencode
>>> import httplib
>>>
>>> template = u"""<?xml version="1.0" encoding="UTF-8"?>
... <spellrequest textalreadyclipped="0" ignoredups="1" ignoredigits="1" ignoreallcaps="0">
... <text>%s</text>
... </spellrequest>
... """
>>>
>>> def chkspell(word, lang='en'):
... data_octets = (template % escape(word)).encode('utf-8')
... con = httplib.HTTPSConnection("www.google.com")
... con.request("POST",
... "/tbproxy/spell?" + urlencode({'lang': lang}),
... data_octets,
... {'content-type': 'text/xml; charset=utf-8'})
... req = con.getresponse()
... return req.read()
...
>>> chkspell('baseball')
'<?xml version="1.0" encoding="UTF-8"?><spellresult error="0" clipped="0" charschecked="8"></spellresult>'
>>> chkspell(corpus, 'he')
'<?xml version="1.0" encoding="UTF-8"?><spellresult error="0" clipped="0" charschecked="5"></spellresult>'
I did notice that when I pasted your example, it appears in the opposite order on my terminal from how it shows in my browser. Not too surprising considering Hebrew is a right-to-left language.
>>> corpus = u"עברית"
>>> print corpus[0]
ע
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With