I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working. <pre class="prettyprint"><code>import urllib2 import json url = "http://foo.com/API.svc/SomeMethod" payload = json.dumps( {'inputs': ['red', 'blue', 'green']} ) headers = {"Content-type": "application/json;"} req = urllib2.Request(url, payload, headers) f = urllib2.urlopen(req) response = f.read() f.close() data = json.loads(response) # <-- Crashes </code></pre> The last line throws an exception: <blockquote> ValueError: No JSON object could be decoded </blockquote> When I look at <code>response</code>, I see valid JSON, but the first few characters are a BOM: <pre class="prettyprint"><code>>>> response '\xef\xbb\xbf[\r\n {\r\n ... Valid JSON here </code></pre> So, if I manually strip out the first three bytes: <pre class="prettyprint"><code>data = json.loads(response[3::]) </code></pre> Everything works and <code>response</code> is turned into a dictionary. My Question: It seems kinda silly that <code>json</code> barfs when you give it a BOM. Is there anything different I can do with <code>urllib</code> or the <code>json</code> library to let it know this is a UTF8 string and to handle it as such? I don't want to manually strip out the first 3 bytes.

In case I'm not the only one who experienced the same problem, but is using <code>requests</code> module instead of <code>urllib2</code>, here is a solution that works in Python 2.6 as well as 3.3: <pre class="prettyprint"><code>import requests r = requests.get(url, params=my_dict, auth=(user, pass)) print(r.headers['content-type']) # 'application/json; charset=utf8' if r.text[0] == u'\ufeff': # bytes \xef\xbb\xbf in utf-8 encoding r.encoding = 'utf-8-sig' print(r.json()) </code></pre>

BOM in server response screws up json parsing

I'm trying to write a Python script that posts some JSON to a web server and gets some JSON back. I patched together a few different examples on StackOverflow, and I think I have something that's mostly working.

import urllib2
import json

url = "http://foo.com/API.svc/SomeMethod"
payload = json.dumps( {'inputs': ['red', 'blue', 'green']} )
headers = {"Content-type": "application/json;"}

req = urllib2.Request(url, payload, headers)
f = urllib2.urlopen(req)
response = f.read()
f.close()

data = json.loads(response) # <-- Crashes

The last line throws an exception:

ValueError: No JSON object could be decoded

When I look at response, I see valid JSON, but the first few characters are a BOM:

>>> response
'\xef\xbb\xbf[\r\n  {\r\n    ... Valid JSON here

So, if I manually strip out the first three bytes:

data = json.loads(response[3::])

Everything works and response is turned into a dictionary.

My Question:

It seems kinda silly that json barfs when you give it a BOM. Is there anything different I can do with urllib or the json library to let it know this is a UTF8 string and to handle it as such? I don't want to manually strip out the first 3 bytes.

What is bom in JSON file?

If the JSON data contains a Byte Order Mark (BOM) to indicate data encoding, then the JSON data may not actually be valid. The IBM Transformation Extender (ITX) Design Studio will not allow use of the JSON file as a native type tree, resulting in the following error when trying to pick a type from the object explorer.

How do I process a JSON response?

To parse a JSON response, we have to first convert the response into a string. To obtain the response we need to use the methods - Response. body or Response. getBody.

What is response in JSON?

json() The json() method of the Response interface takes a Response stream and reads it to completion. It returns a promise which resolves with the result of parsing the body text as JSON .

What is request and response in JSON?

json() returns a JSON object of the result (if the result was written in JSON format, if not it raises an error). Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object.

You should probably yell at whoever's running this service, because a BOM on UTF-8 text makes no sense. The BOM exists to disambiguate byte order, and UTF-8 is defined as being little-endian.

That said, ideally you should decode bytes before doing anything else with them. Luckily, Python has a codec that recognizes and removes the BOM: utf-8-sig.

>>> '\xef\xbb\xbffoo'.decode('utf-8-sig')
u'foo'

So you just need:

data = json.loads(response.decode('utf-8-sig'))

In case I'm not the only one who experienced the same problem, but is using requests module instead of urllib2, here is a solution that works in Python 2.6 as well as 3.3:

import requests
r = requests.get(url, params=my_dict, auth=(user, pass))
print(r.headers['content-type'])  # 'application/json; charset=utf8'
if r.text[0] == u'\ufeff':  # bytes \xef\xbb\xbf in utf-8 encoding
    r.encoding = 'utf-8-sig'
print(r.json())

BOM in server response screws up json parsing

Tags:

python

json

urllib

urllib2

Mike Christensen

People also ask

2 Answers

Eevee

Aprillion

Recent Activity

Donate For Us

BOM in server response screws up json parsing

Tags:

python

json

urllib

urllib2

Mike Christensen

People also ask

2 Answers

Eevee

Aprillion

Related questions

Recent Activity

Donate For Us