How do I post unicode characters using httplib?

Tags:

I try to post unicode data with the httplib.request function:

s = u"עברית"
data = """
<spellrequest textalreadyclipped="0" ignoredups="1" ignoredigits="1" ignoreallcaps="0">
<text>%s</text>
</spellrequest>
""" % s

con = httplib.HTTPSConnection("www.google.com")
con.request("POST", "/tbproxy/spell?lang=he", data)
response = con.getresponse().read()

However this is my error:

Traceback (most recent call last):
  File "C:\Scripts\iQuality\test.py", line 47, in <module>
    print spellFix(u"╫á╫נ╫¿╫ץ╫ר╫ץ")
  File "C:\Scripts\iQuality\test.py", line 26, in spellFix
    con.request("POST", "/tbproxy/spell?lang=%s" % lang, data)
  File "C:\Python27\lib\httplib.py", line 955, in request
    self._send_request(method, url, body, headers)
  File "C:\Python27\lib\httplib.py", line 989, in _send_request
    self.endheaders(body)
  File "C:\Python27\lib\httplib.py", line 951, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 815, in _send_output
    self.send(message_body)
  File "C:\Python27\lib\httplib.py", line 787, in send
    self.sock.sendall(data)
  File "C:\Python27\lib\ssl.py", line 220, in sendall
    v = self.send(data[count:])
  File "C:\Python27\lib\ssl.py", line 189, in send
    v = self._sslobj.write(data)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 97-102: or
dinal not in range(128)

Where am I wrong?

698

asked Apr 14 '12 00:04

iTayb

1 Answers

http is not defined in terms of a particular character encoding, and instead uses octets. You need to convert your data to an encoding, and then you need to tell the server which encoding you have used. Lets use utf8, since it's usually the best choice:

This data looks a bit like XML, but you are skipping the xml tag. Some services may accept that, but you shouldn't anyways. In fact, the encoding actually belongs there; so make sure you include it. The heading looks like <?xml version="1.0" encoding="encoding"?>.

s = u"עברית"
data_unicode = u"""<?xml version="1.0" encoding="UTF-8"?>
<spellrequest textalreadyclipped="0" ignoredups="1" ignoredigits="1" ignoreallcaps="0">
<text>%s</text>
</spellrequest>
""" % s

data_octets = data_unicode.encode('utf-8')

As a matter of courtesy, you should also tell the server itself the format and encoding, with the content-type header:

con = httplib.HTTPSConnection("www.google.com")
con.request("POST",
            "/tbproxy/spell?lang=he", 
            data_octets, {'content-type': 'text/xml; charset=utf-8'})

EDIT: It's working fine on my machine, are you sure you're not skipping something? full example

>>> from cgi import escape
>>> from urllib import urlencode
>>> import httplib
>>> 
>>> template = u"""<?xml version="1.0" encoding="UTF-8"?>
... <spellrequest textalreadyclipped="0" ignoredups="1" ignoredigits="1" ignoreallcaps="0">
... <text>%s</text>
... </spellrequest>
... """
>>> 
>>> def chkspell(word, lang='en'):
...     data_octets = (template % escape(word)).encode('utf-8')
...     con = httplib.HTTPSConnection("www.google.com")
...     con.request("POST",
...         "/tbproxy/spell?" + urlencode({'lang': lang}),
...         data_octets,
...         {'content-type': 'text/xml; charset=utf-8'})
...     req = con.getresponse()
...     return req.read()
... 
>>> chkspell('baseball')
'<?xml version="1.0" encoding="UTF-8"?><spellresult error="0" clipped="0" charschecked="8"></spellresult>'
>>> chkspell(corpus, 'he')
'<?xml version="1.0" encoding="UTF-8"?><spellresult error="0" clipped="0" charschecked="5"></spellresult>'

I did notice that when I pasted your example, it appears in the opposite order on my terminal from how it shows in my browser. Not too surprising considering Hebrew is a right-to-left language.

>>> corpus = u"עברית"
>>> print corpus[0]
ע

answered Sep 30 '22 13:09

SingleNegationElimination

Related questions
                            
                                Python: Extract variables out of namespace
                            
                                How to extract info from scikits.learn classifier to then use in C code
                            
                                Stripping payload from a tcpdump?
                            
                                PyInstaller error with PyQt when trying to build --onefile
                            
                                Reverse Inlines in Django Admin with more than one model
                            
                                Dense Cholesky update in Python
                            
                                How to programmatically determine default applications in linux
                            
                                Flask giving an internal server error instead of rendering 404
                            
                                RPython sys methods don't work
                            
                                Getting the first item item in a many-to-many relation in Django
                            
                                Console windows closes right after I hit ctrl+F5 in visual studio tools for python
                            
                                Calculating Probability of a Random Variable in a Distribution in Python
                            
                                How to implement a persistent Python `list`?
                            
                                Does python 3.1.3 support unicode in csv module?
                            
                                Share memory areas between celery workers on one machine
                            
                                PostgreSQL multidimensional arrays in SQLAlchemy
                            
                                How do a load a python package resource from the current distribution using pkg_resources?
                            
                                How can I create a database using pymssql
                            
                                fast Cartesian to Polar to Cartesian in Python
                            
                                How to combine lines in two files with condition in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I post unicode characters using httplib?

Tags:

python

unicode

httplib

iTayb

People also ask

1 Answers

SingleNegationElimination

Recent Activity

Donate For Us