I needed to parse files generated by other tool, which unconditionally outputs json file with UTF-8 BOM header (EFBBBF). I soon found that this was the problem, as Python 2.7 module can't seem to parse it:
>>> import json >>> data = json.load(open('sample.json')) ValueError: No JSON object could be decoded
Removing BOM, solves it, but I wonder if there is another way of parsing json file with BOM header?
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
"sig" in "utf-8-sig" is the abbreviation of "signature" (i.e. signature utf-8 file). Using utf-8-sig to read a file will treat the BOM as metadata that explains how to interpret the file, instead of as part of the file contents.
BOM characters are invisible characters that can be added to a text file as additional information. They are used, for example, to define a specific text coding. In FileMaker add-ons, JSON files are given such a BOM character.
You can open with codecs
:
import json import codecs json.load(codecs.open('sample.json', 'r', 'utf-8-sig'))
or decode with utf-8-sig
yourself and pass to loads
:
json.loads(open('sample.json').read().decode('utf-8-sig'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With