I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working.
import urllib2
import json
url = "http://foo.com/API.svc/SomeMethod"
payload = json.dumps( {'inputs': ['red', 'blue', 'green']} )
headers = {"Content-type": "application/json;"}
req = urllib2.Request(url, payload, headers)
f = urllib2.urlopen(req)
response = f.read()
f.close()
data = json.loads(response) # <-- Crashes
The last line throws an exception:
ValueError: No JSON object could be decoded
When I look at response
, I see valid JSON, but the first few characters are a BOM:
>>> response
'\xef\xbb\xbf[\r\n {\r\n ... Valid JSON here
So, if I manually strip out the first three bytes:
data = json.loads(response[3::])
Everything works and response
is turned into a dictionary.
My Question:
It seems kinda silly that json
barfs when you give it a BOM. Is there anything different I can do with urllib
or the json
library to let it know this is a UTF8 string and to handle it as such? I don't want to manually strip out the first 3 bytes.
If the JSON data contains a Byte Order Mark (BOM) to indicate data encoding, then the JSON data may not actually be valid. The IBM Transformation Extender (ITX) Design Studio will not allow use of the JSON file as a native type tree, resulting in the following error when trying to pick a type from the object explorer.
To parse a JSON response, we have to first convert the response into a string. To obtain the response we need to use the methods - Response. body or Response. getBody.
json() The json() method of the Response interface takes a Response stream and reads it to completion. It returns a promise which resolves with the result of parsing the body text as JSON .
json() returns a JSON object of the result (if the result was written in JSON format, if not it raises an error). Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object.
You should probably yell at whoever's running this service, because a BOM on UTF-8 text makes no sense. The BOM exists to disambiguate byte order, and UTF-8 is defined as being little-endian.
That said, ideally you should decode bytes before doing anything else with them. Luckily, Python has a codec that recognizes and removes the BOM: utf-8-sig
.
>>> '\xef\xbb\xbffoo'.decode('utf-8-sig')
u'foo'
So you just need:
data = json.loads(response.decode('utf-8-sig'))
In case I'm not the only one who experienced the same problem, but is using requests
module instead of urllib2
, here is a solution that works in Python 2.6 as well as 3.3:
import requests
r = requests.get(url, params=my_dict, auth=(user, pass))
print(r.headers['content-type']) # 'application/json; charset=utf8'
if r.text[0] == u'\ufeff': # bytes \xef\xbb\xbf in utf-8 encoding
r.encoding = 'utf-8-sig'
print(r.json())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With