I have access to a set of files (around 80-800mb each). Unfortunately, there's only one line in every file. The line contains exactly one JSON object (a list of lists). What's the best way to load and parse it into smaller JSON objects?
The module pandas 0.21.0
now supports chunksize as part of read_json
. You can load and manipulate one chunk at a time:
import pandas as pd
chunks = pd.read_json(file, lines=True, chunksize = 100)
for c in chunks:
print(c)
There is already a similar post here. Here is the solution they proposed:
import json
with open('file.json') as infile:
o = json.load(infile)
chunkSize = 1000
for i in xrange(0, len(o), chunkSize):
with open('file_' + str(i//chunkSize) + '.json', 'w') as outfile:
json.dump(o[i:i+chunkSize], outfile)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With