Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read line-delimited JSON from large file (line by line)

I'm trying to load a large file (2GB in size) filled with JSON strings, delimited by newlines. Ex:

{     "key11": value11,     "key12": value12, } {     "key21": value21,     "key22": value22, } … 

The way I'm importing it now is:

content = open(file_path, "r").read()  j_content = json.loads("[" + content.replace("}\n{", "},\n{") + "]") 

Which seems like a hack (adding commas between each JSON string and also a beginning and ending square bracket to make it a proper list).

Is there a better way to specify the JSON delimiter (newline \n instead of comma ,)?

Also, Python can't seem to properly allocate memory for an object built from 2GB of data, is there a way to construct each JSON object as I'm reading the file line by line? Thanks!

like image 872
Cat Avatar asked Feb 03 '14 17:02

Cat


People also ask

How do you read a JSON file line by line in Python?

Python read JSON file line by lineStep 1: import json module. Step 3: Read the json file using open() and store the information in file variable. Step 4: Convert item from json to python using load() & store the information in db variable. Step 5: append db in lineByLine empty list.

How do I load a large JSON file in Python?

To load big JSON files in a memory efficient and fast way with Python, we can use the ijson library. We call ijson. parse to parse the file opened by open . Then we print the key prefix , data type of the JSON value store in the_type , and the value of the entry with the given key prefix .

Are JSON files delimited?

Line-delimited JSON can be read by a parser that can handle concatenated JSON. Concatenated JSON that contains newlines within a JSON object can't be read by a line-delimited JSON parser. The terms "line-delimited JSON" and "newline-delimited JSON" are often used without clarifying if embedded newlines are supported.


2 Answers

Just read each line and construct a json object at this time:

with open(file_path) as f:     for line in f:         j_content = json.loads(line) 

This way, you load proper complete json object (provided there is no \n in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.

There is also this answer.:

https://stackoverflow.com/a/7795029/671543

like image 148
njzk2 Avatar answered Sep 28 '22 22:09

njzk2


contents = open(file_path, "r").read()  data = [json.loads(str(item)) for item in contents.strip().split('\n')] 
like image 28
Tjorriemorrie Avatar answered Sep 28 '22 22:09

Tjorriemorrie