Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python ijson to read a large json file with multiple json objects

Tags:

python

json

I'm trying to parse a large (~100MB) json file using ijson package which allows me to interact with the file in an efficient way. However, after writing some code like this,

with open(filename, 'r') as f:
    parser = ijson.parse(f)
    for prefix, event, value in parser:
        if prefix == "name":
            print(value)

I found that the code parses only the first line and not the rest of the lines from the file!!

Here is how a portion of my json file looks like:

{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.012000}
{"name":"engine_speed","value":772,"timestamp":1364323939.027000}
{"name":"vehicle_speed","value":0,"timestamp":1364323939.029000}
{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.035000}

In my opinion, I think ijson parses only one json object.

Can someone please suggest how to work around this?

like image 851
Boubouh Karim Avatar asked May 13 '16 02:05

Boubouh Karim


People also ask

How do I load a large JSON file?

To load big JSON files in a memory efficient and fast way with Python, we can use the ijson library. We call ijson. parse to parse the file opened by open . Then we print the key prefix , data type of the JSON value store in the_type , and the value of the entry with the given key prefix .

How do you handle large JSON data?

If the data doesn't update too frequently, you can even cache it on the frontend. This would at least prevent the user from needing to fetch it repeatedly. Alternatively, you can read the JSON via a stream on the server and stream the data to the client and use something like JSONStream to parse the data on the client.


1 Answers

Since the provided chunk looks more like a set of lines each composing an independent JSON, it should be parsed accordingly:

# each JSON is small, there's no need in iterative processing
import json 
with open(filename, 'r') as f:
    for line in f:
        data = json.loads(line)
        # data[u'name'], data[u'engine_speed'], data[u'timestamp'] now
        # contain correspoding values
like image 198
user3159253 Avatar answered Sep 16 '22 12:09

user3159253