Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python load json file with UTF-8 BOM header

Tags:

python

json

I needed to parse files generated by other tool, which unconditionally outputs json file with UTF-8 BOM header (EFBBBF). I soon found that this was the problem, as Python 2.7 module can't seem to parse it:

>>> import json >>> data = json.load(open('sample.json'))  ValueError: No JSON object could be decoded 

Removing BOM, solves it, but I wonder if there is another way of parsing json file with BOM header?

like image 227
theta Avatar asked Oct 31 '12 11:10

theta


People also ask

Are JSON files UTF-8?

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

What is the difference between UTF-8 and UTF-8 sig?

"sig" in "utf-8-sig" is the abbreviation of "signature" (i.e. signature utf-8 file). Using utf-8-sig to read a file will treat the BOM as metadata that explains how to interpret the file, instead of as part of the file contents.

What is JSON BOM?

BOM characters are invisible characters that can be added to a text file as additional information. They are used, for example, to define a specific text coding. In FileMaker add-ons, JSON files are given such a BOM character.


1 Answers

You can open with codecs:

import json import codecs  json.load(codecs.open('sample.json', 'r', 'utf-8-sig')) 

or decode with utf-8-sig yourself and pass to loads:

json.loads(open('sample.json').read().decode('utf-8-sig')) 
like image 188
Pavel Anossov Avatar answered Sep 18 '22 17:09

Pavel Anossov