with fhe following code
payload = '''
工作报告
总体情况:良好
'''
r = requests.post("http://httpbin.org/post", data=payload)
what is the default encoding when Requests post data is string type? UTF8 or unicode-escape?
if I like to specify a encoding type, do I have to encode it myself and pass a bytes object to parameter 'data'?
Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are str type — which is bytes~ And Default encoding is ASCII.
To create a POST request in Python, use the requests. post() method. The requests post() method accepts URL. data, json, and args as arguments and sends a POST request to a specified URL.
To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.
As per latest JSON spec (RFC-8259) when using external services you must encode your JSON payloads as UTF-8. Here is a quick solution:
r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'))
requests
uses httplib
which defaults to latin-1
encoding. Byte arrays aren't automatically encoded so it is always better to use them.
I'd also recommend to set the charset using the headers
parameter:
r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'),
headers={'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8'})
If you actually try your example you will find:
$ python
Python 3.7.2 (default, Jan 29 2019, 13:41:02)
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> payload = '''
... 工作报告
... 总体情况:良好
... '''
>>> r = requests.post("http://127.0.0.1:8888/post", data=payload)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/tmp/venv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/tmp/venv/lib/python3.7/http/client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/tmp/venv/lib/python3.7/http/client.py", line 1274, in _send_request
body = _encode(body, 'body')
File "/tmp/venv/lib/python3.7/http/client.py", line 160, in _encode
(name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 2-5: Body ('工作报告') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
As described in Detecting the character encoding of an HTTP POST request the default encoding for HTTP POST is ISO-8859-1 aka Latin-1. And as the error message right at the end of the traceback tells you, you can force it by encoding to an UTF-8 bytes
string; but then of course your server needs to be expecting UTF-8, too; or you will simply be sending useless Latin-1 mojibake.
There is no way in the POST interface itself to enforce this, but your server could in fact require clients to explicitly specify their content encoding by using the charset
parameter; maybe return a specific 5xx error code with an explicit error message if it's missing.
Somewhat less disciplinedly, you could have your server attempt to decode incoming POST requests as UTF-8, and reject the POST if that fails.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With