Using python ijson to read a large json file with multiple json objects

Tags:

python

json

I'm trying to parse a large (~100MB) json file using ijson package which allows me to interact with the file in an efficient way. However, after writing some code like this,

with open(filename, 'r') as f:
    parser = ijson.parse(f)
    for prefix, event, value in parser:
        if prefix == "name":
            print(value)

I found that the code parses only the first line and not the rest of the lines from the file!!

Here is how a portion of my json file looks like:

{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.012000}
{"name":"engine_speed","value":772,"timestamp":1364323939.027000}
{"name":"vehicle_speed","value":0,"timestamp":1364323939.029000}
{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.035000}

In my opinion, I think ijson parses only one json object.

Can someone please suggest how to work around this?

851

asked May 13 '16 02:05

Boubouh Karim

1 Answers

Since the provided chunk looks more like a set of lines each composing an independent JSON, it should be parsed accordingly:

# each JSON is small, there's no need in iterative processing
import json 
with open(filename, 'r') as f:
    for line in f:
        data = json.loads(line)
        # data[u'name'], data[u'engine_speed'], data[u'timestamp'] now
        # contain correspoding values

198

answered Sep 16 '22 12:09

user3159253

Related questions
                            
                                Pandas expand json field across records
                            
                                How does module loading work in CPython?
                            
                                Apply function on all values of dictionary [duplicate]
                            
                                Selecting local minima and maxima from pandas.Series
                            
                                Gsutil - How can I check if a file exists in a GCS bucket (a sub-directory) using Gsutil
                            
                                How to prevent plotly from plotting automatically
                            
                                How do I get the id from an ObjectID, using Python?
                            
                                MongoDB: Query a key having space in its name
                            
                                Put the result of simple tag into a variable
                            
                                Python OpenCV Template Matching error
                            
                                Read 16-bit PNG image file using Python
                            
                                How to merge rows with same index on a single data frame?
                            
                                Pandas Dataframe Comparison and Floating Point Precision
                            
                                The similar method from the nltk module produces different results on different machines. Why?
                            
                                Python decimal.InvalidOperation error
                            
                                Use plotly offline to generate graphs as images
                            
                                uwsgi http is ambiguous
                            
                                How to detect when pytest test case failed?
                            
                                How to get indices of non-diagonal elements of a numpy array?
                            
                                Pandas: How to reference and print multiple dataframes as HTML tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With