What is the best way to load JSON Strings in Python?
I want to use json.loads to process unicode like this:
import json
json.loads(unicode_string_to_load)
I also tried supplying 'encoding' parameter with value 'utf-16', but the error did not go away.
Full SSCCE with error:
# -*- coding: utf-8 -*-
import json
value = '{"foo" : "bar"}'
print(json.loads(value)['foo']) #This is correct, prints 'bar'
some_unicode = unicode("degradé")
#last character is latin e with acute "\xe3\xa9"
value = '{"foo" : "' + some_unicode + '"}'
print(json.loads(value)['foo']) #incorrect, throws error
Error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
6: ordinal not in range(128)
I typecasting the string into unicode string using 'latin-1' fixed the error:
UnicodeDecodeError: 'utf16' codec can't decode byte 0x38 in
position 6: truncated data
Fixed code:
import json
ustr_to_load = unicode(str_to_load, 'latin-1')
json.loads(ustr_to_load)
And then the error is not thrown.
The OP clarifies (in a comment!)...:
Source data is huge unicode encoded string
Then you have to know which of the many unicode encodings it uses -- clearly not 'utf-16', since that failed, but there are so many others -- 'utf-8', 'iso-8859-15', and so forth. You either try them all until one works, or print repr(str_to_load[:80])
and paste what it shows as an edit of your question, so we can guess on your behalf!-).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With