Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to load large JSON lists in Python? [duplicate]

I have access to a set of files (around 80-800mb each). Unfortunately, there's only one line in every file. The line contains exactly one JSON object (a list of lists). What's the best way to load and parse it into smaller JSON objects?

like image 679
Sam Odio Avatar asked Dec 07 '22 15:12

Sam Odio


2 Answers

The module pandas 0.21.0 now supports chunksize as part of read_json. You can load and manipulate one chunk at a time:

import pandas as pd
chunks = pd.read_json(file, lines=True, chunksize = 100)
for c in chunks:
    print(c)
like image 183
VinceP Avatar answered Dec 09 '22 15:12

VinceP


There is already a similar post here. Here is the solution they proposed:

import json
with open('file.json') as infile:
  o = json.load(infile)
  chunkSize = 1000
  for i in xrange(0, len(o), chunkSize):
    with open('file_' + str(i//chunkSize) + '.json', 'w') as outfile:
      json.dump(o[i:i+chunkSize], outfile)
like image 37
Charles Menguy Avatar answered Dec 09 '22 14:12

Charles Menguy