Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BOM in server response screws up json parsing

I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working.

import urllib2
import json

url = "http://foo.com/API.svc/SomeMethod"
payload = json.dumps( {'inputs': ['red', 'blue', 'green']} )
headers = {"Content-type": "application/json;"}

req = urllib2.Request(url, payload, headers)
f = urllib2.urlopen(req)
response = f.read()
f.close()

data = json.loads(response) # <-- Crashes

The last line throws an exception:

ValueError: No JSON object could be decoded

When I look at response, I see valid JSON, but the first few characters are a BOM:

>>> response
'\xef\xbb\xbf[\r\n  {\r\n    ... Valid JSON here

So, if I manually strip out the first three bytes:

data = json.loads(response[3::])

Everything works and response is turned into a dictionary.

My Question:

It seems kinda silly that json barfs when you give it a BOM. Is there anything different I can do with urllib or the json library to let it know this is a UTF8 string and to handle it as such? I don't want to manually strip out the first 3 bytes.

like image 412
Mike Christensen Avatar asked Jan 25 '13 23:01

Mike Christensen


People also ask

What is bom in JSON file?

If the JSON data contains a Byte Order Mark (BOM) to indicate data encoding, then the JSON data may not actually be valid. The IBM Transformation Extender (ITX) Design Studio will not allow use of the JSON file as a native type tree, resulting in the following error when trying to pick a type from the object explorer.

How do I process a JSON response?

To parse a JSON response, we have to first convert the response into a string. To obtain the response we need to use the methods - Response. body or Response. getBody.

What is response in JSON?

json() The json() method of the Response interface takes a Response stream and reads it to completion. It returns a promise which resolves with the result of parsing the body text as JSON .

What is request and response in JSON?

json() returns a JSON object of the result (if the result was written in JSON format, if not it raises an error). Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object.


2 Answers

You should probably yell at whoever's running this service, because a BOM on UTF-8 text makes no sense. The BOM exists to disambiguate byte order, and UTF-8 is defined as being little-endian.

That said, ideally you should decode bytes before doing anything else with them. Luckily, Python has a codec that recognizes and removes the BOM: utf-8-sig.

>>> '\xef\xbb\xbffoo'.decode('utf-8-sig')
u'foo'

So you just need:

data = json.loads(response.decode('utf-8-sig'))
like image 72
Eevee Avatar answered Oct 20 '22 13:10

Eevee


In case I'm not the only one who experienced the same problem, but is using requests module instead of urllib2, here is a solution that works in Python 2.6 as well as 3.3:

import requests
r = requests.get(url, params=my_dict, auth=(user, pass))
print(r.headers['content-type'])  # 'application/json; charset=utf8'
if r.text[0] == u'\ufeff':  # bytes \xef\xbb\xbf in utf-8 encoding
    r.encoding = 'utf-8-sig'
print(r.json())
like image 33
Aprillion Avatar answered Oct 20 '22 11:10

Aprillion