Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the default encoding when python Requests post data is string type?

with fhe following code

payload = '''
 工作报告 
 总体情况:良好 
'''
r = requests.post("http://httpbin.org/post", data=payload)

what is the default encoding when Requests post data is string type? UTF8 or unicode-escape?

if I like to specify a encoding type, do I have to encode it myself and pass a bytes object to parameter 'data'?

like image 910
Jcyrss Avatar asked Apr 28 '19 07:04

Jcyrss


People also ask

What is default string encoding in Python?

Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are str type — which is bytes~ And Default encoding is ASCII.

How do you send a HTTP POST request in Python?

To create a POST request in Python, use the requests. post() method. The requests post() method accepts URL. data, json, and args as arguments and sends a POST request to a specified URL.

How do I decode a UTF-8 string in Python?

To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.


2 Answers

As per latest JSON spec (RFC-8259) when using external services you must encode your JSON payloads as UTF-8. Here is a quick solution:

r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'))

requests uses httplib which defaults to latin-1 encoding. Byte arrays aren't automatically encoded so it is always better to use them.

I'd also recommend to set the charset using the headers parameter:

r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'),
                  headers={'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8'})
like image 161
neves Avatar answered Sep 24 '22 16:09

neves


If you actually try your example you will find:

$ python
Python 3.7.2 (default, Jan 29 2019, 13:41:02) 
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> payload = '''
...  工作报告 
...  总体情况:良好 
... '''
>>> r = requests.post("http://127.0.0.1:8888/post", data=payload)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 116, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/tmp/venv/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/tmp/venv/lib/python3.7/http/client.py", line 1274, in _send_request
    body = _encode(body, 'body')
  File "/tmp/venv/lib/python3.7/http/client.py", line 160, in _encode
    (name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 2-5: Body ('工作报告') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

As described in Detecting the character encoding of an HTTP POST request the default encoding for HTTP POST is ISO-8859-1 aka Latin-1. And as the error message right at the end of the traceback tells you, you can force it by encoding to an UTF-8 bytes string; but then of course your server needs to be expecting UTF-8, too; or you will simply be sending useless Latin-1 mojibake.

There is no way in the POST interface itself to enforce this, but your server could in fact require clients to explicitly specify their content encoding by using the charset parameter; maybe return a specific 5xx error code with an explicit error message if it's missing.

Somewhat less disciplinedly, you could have your server attempt to decode incoming POST requests as UTF-8, and reject the POST if that fails.

like image 21
tripleee Avatar answered Sep 26 '22 16:09

tripleee