Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading zipped JSON files

I downloaded the 2015 adverse drug events data from openFDA and I want to run some analysis with Python.

I cannot get the JSON decoding to work.

I am able to find code snippets for gzip but not for plain zip files.

The error message I get is:

TypeError: the JSON object must be str, not 'bytes'

The JSON files are large. Is jsonstreamer, ijson, or another library the recommended tool?

The JSON file looks like this (after manual unzip):

{
  "meta": {
    "last_updated": "2016-11-18",
    "terms": "https://open.fda.gov/terms/",
    "results": {
      "skip": 0,
      "total": 304100,
      "limit": 25000
    },
    "license": "https://open.fda.gov/license/",
    "disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service."
  },

This is my code:

import json  
import zipfile  

d = None  
data = None  
with zipfile.ZipFile("./data/drug-event-Q4-0001-of-0013.json.zip", "r") as z:
   for filename in z.namelist():  
      print(filename)  
      with z.open(filename) as f:  
         data = f.read()  
         d = json.loads(data)  
like image 276
h.das Avatar asked Nov 27 '16 01:11

h.das


1 Answers

The data you read from the zipfile are bytes. The Json decoder wants text instead. So; as usual for this kind of issues, you'll have to decode the bytes into a string before feeding it to the json module.

I'm assuming the json files are saved in UTF-8 encoding so this will do the trick:

d = json.loads(data.decode("utf-8"))

Change the character encoding accordingly if your json files are in a different encoding.

Regarding your second question: how large is 'large'?

like image 150
Irmen de Jong Avatar answered Sep 18 '22 16:09

Irmen de Jong