I downloaded the 2015 adverse drug events data from openFDA and I want to run some analysis with Python.
I cannot get the JSON decoding to work.
I am able to find code snippets for gzip but not for plain zip files.
The error message I get is:
TypeError: the JSON object must be str, not 'bytes'
The JSON files are large. Is jsonstreamer
, ijson
, or another library the recommended tool?
The JSON file looks like this (after manual unzip):
{
"meta": {
"last_updated": "2016-11-18",
"terms": "https://open.fda.gov/terms/",
"results": {
"skip": 0,
"total": 304100,
"limit": 25000
},
"license": "https://open.fda.gov/license/",
"disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service."
},
This is my code:
import json
import zipfile
d = None
data = None
with zipfile.ZipFile("./data/drug-event-Q4-0001-of-0013.json.zip", "r") as z:
for filename in z.namelist():
print(filename)
with z.open(filename) as f:
data = f.read()
d = json.loads(data)
The data you read from the zipfile are bytes. The Json decoder wants text instead. So; as usual for this kind of issues, you'll have to decode the bytes into a string before feeding it to the json module.
I'm assuming the json files are saved in UTF-8 encoding so this will do the trick:
d = json.loads(data.decode("utf-8"))
Change the character encoding accordingly if your json files are in a different encoding.
Regarding your second question: how large is 'large'?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With