How to find unique values in a large JSON file?

Tags:

I've 2 json files of size data_large(150.1mb) and data_small(7.5kb). The content inside each file is of type [{"score": 68},{"score": 78}]. I need to find the list of unique scores from each file.

While dealing with data_small, I did the following and I was able to view its content with 0.1 secs.

with open('data_small') as f:
    content = json.load(f)

print content # I'll be applying the logic to find the unique values later.

But while dealing with data_large, I did the following and my system got hanged, slow and had to force shut-it down to bring it into its normal speed. It took around 2 mins to print its content.

with open('data_large') as f:
    content = json.load(f)

print content # I'll be applying the logic to find the unique values later.

How can I increase the efficiency of the program while dealing with large data-sets?

259

asked Jan 04 '14 08:01

python-coder

1 Answers

Since your json file is not that large and you can afford to open it into ram all at once, you can get all unique values like:

with open('data_large') as f:
    content = json.load(f)

# do not print content since it prints it to stdout which will be pretty slow

# get the unique values
values = set()
for item in content:
    values.add(item['score'])

# the above uses less memory compared to this
# since this has to create another array with all values
# and then filter it for unique values
values = set([i['score'] for i in content])

# its faster to save the results to a file rather than print them
with open('results.json', 'wb') as fid:
    # json cant serialize sets hence conversion to list
    json.dump(list(values), fid)

If you will need to process even bigger files, then look for other libraries which can parse a json file iteratively.

answered Sep 19 '22 01:09

miki725

Related questions
                            
                                why does flask logs only when app.debug = True?
                            
                                Scikit Learn HMM training with set of observation sequences
                            
                                Connect to Dynamics CRM with python suds
                            
                                Scripting a command line psql command in python
                            
                                AutoIt to Python encrypt/decrypt
                            
                                Python Social Auth NotAllowedToDisconnect at /disconnect/facebook/1/
                            
                                Python Function Capsules
                            
                                django templates : how to expand a variable into the string argument for the built-in tag `url`
                            
                                Why do I keep getting this big error in python. Traceback (most recent call last)..... and AttributeError
                            
                                Inserting a unicode character using .join()
                            
                                Good coding style: use temporary variable for list length or not? [closed]
                            
                                Find the cursor's current position in Python turtle
                            
                                South: how to revert migrations in production server?
                            
                                How to run Python 3 in Sublime 2 REPL Mac
                            
                                Bad practice to have ORMs with NoSQL stores?
                            
                                Trying to print out the decision tree for a forest from scikit-learn ensemble
                            
                                android.Android() on QPython error
                            
                                Twisted inlineCallbacks and remote generators
                            
                                Styling with classes in Pyside + Python
                            
                                Parse dates with any separator using python's strptime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find unique values in a large JSON file?

Tags:

python

json

python-coder

People also ask

1 Answers

miki725

Recent Activity

Donate For Us