I'm trying to load a large file (2GB in size) filled with JSON strings, delimited by newlines. Ex:
{ "key11": value11, "key12": value12, } { "key21": value21, "key22": value22, } …
The way I'm importing it now is:
content = open(file_path, "r").read() j_content = json.loads("[" + content.replace("}\n{", "},\n{") + "]")
Which seems like a hack (adding commas between each JSON string and also a beginning and ending square bracket to make it a proper list).
Is there a better way to specify the JSON delimiter (newline \n
instead of comma ,
)?
Also, Python
can't seem to properly allocate memory for an object built from 2GB of data, is there a way to construct each JSON
object as I'm reading the file line by line? Thanks!
Python read JSON file line by lineStep 1: import json module. Step 3: Read the json file using open() and store the information in file variable. Step 4: Convert item from json to python using load() & store the information in db variable. Step 5: append db in lineByLine empty list.
To load big JSON files in a memory efficient and fast way with Python, we can use the ijson library. We call ijson. parse to parse the file opened by open . Then we print the key prefix , data type of the JSON value store in the_type , and the value of the entry with the given key prefix .
Line-delimited JSON can be read by a parser that can handle concatenated JSON. Concatenated JSON that contains newlines within a JSON object can't be read by a line-delimited JSON parser. The terms "line-delimited JSON" and "newline-delimited JSON" are often used without clarifying if embedded newlines are supported.
Just read each line and construct a json object at this time:
with open(file_path) as f: for line in f: j_content = json.loads(line)
This way, you load proper complete json object (provided there is no \n
in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.
There is also this answer.:
https://stackoverflow.com/a/7795029/671543
contents = open(file_path, "r").read() data = [json.loads(str(item)) for item in contents.strip().split('\n')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With