Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load part of a json in python

Tags:

python

I have a json file with about a 1000 data entries. For example

{"1":"Action","2":"Adventure",....."1000":"Mystery"}

The above is just a example.

I am using the json.load feature by importing json. How do I load only the first 10 data entries from the json.

{"1":"Action","2":"Adventure",....."10":"Thriller"}
like image 889
cann0nextr3me Avatar asked Sep 18 '15 21:09

cann0nextr3me


People also ask

How do I load a JSON string in Python?

Use the json.loads() function. The json. loads() function accepts as input a valid string and converts it to a Python dictionary. This process is called deserialization – the act of converting a string to an object.

How do you access a json object in Python?

It's pretty easy to load a JSON object in Python. Python has a built-in package called json, which can be used to work with JSON data. It's done by using the JSON module, which provides us with a lot of methods which among loads() and load() methods are gonna help us to read the JSON file.

What is JSON loads () in Python?

loads() json. loads() method can be used to parse a valid JSON string and convert it into a Python Dictionary. It is mainly used for deserializing native string, byte, or byte array which consists of JSON data into Python Dictionary.


3 Answers

JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library json module at any rate.

After loading, you could take the ten key-value pairs with the lowest key value:

import heapq
import json

data = json.loads(json_string)
limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)}

The heapq.nsmallest() will efficiently pick out the 10 smallest keys regardless of the size of data.

Of course, if the keys are always consecutive and always start at 1, you may as well use a range() here:

data = json.loads(json_string)
limited = {str(k): data[str(k)] for k in range(1, 11)}

If you want to capture the objects in file definition order you could use the object_pairs_hook argument to json.load() and json.loads():

class FirstTenDict(dict):
    def __init__(self, pairs):
        super(FirstTenDict, self).__init__(pairs[:10])

data = json.loads(json_string, object_pairs_hook=FirstTenDict)

Demo of the latter approach:

>>> import json
>>> class FirstTenDict(dict):
...     def __init__(self, pairs):
...         super(FirstTenDict, self).__init__(pairs[:10])
... 
>>> json_data = '''\
... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs",
...  "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle",
...  "foo13": "will", "foo31": "be", "foo76": "ignored"}
... '''
>>> json.loads(json_data)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'}
>>> json.loads(json_data, object_pairs_hook=FirstTenDict)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'}
>>> import pprint
>>> pprint.pprint(_)
{'foo1': 'eggs',
 'foo10': 'spam',
 'foo11': 'eric',
 'foo21': 'monty',
 'foo24': 'vikings',
 'foo31': 'baz',
 'foo42': 'bar',
 'foo44': 'ham',
 'foo65': 'idle',
 'foo88': 'python'}
like image 192
Martijn Pieters Avatar answered Oct 27 '22 16:10

Martijn Pieters


You can iteratively parse json (that is to say, not "all at once") using ijson, and assuming your input really is as simple as your example:

import ijson

def iter_items(parser):
    for prefix, event, value in parser:
        if event == 'string':
            yield prefix, value

with open('filename.json') as infile:
    items = iter_items(ijson.parser(infile))
    # choose one of the following
    # first 10 items from the file regardless of keys
    print dict(itertools.islice(items, 10))
    # least 10 keys when considered as integers
    print dict(heapq.nsmallest(items, 10, lambda p: int(p[0])))

Obviously the second of these would still have to read the whole file, it just doesn't have to keep the whole file in memory at once. Avoiding that is premature optimization for only 1000 small key-value pairs, but whatever. I found the question interesting enough to use a library I've never considered before because maybe sometimes json files are huge, and because of the close analogy with SAX parsers (which are event-based streaming parsers for XML).

By the way, if order was important then the producer of this JSON probably should put an array in the JSON. But perhaps as consumer you can't do anything about that.

like image 24
Steve Jessop Avatar answered Oct 27 '22 18:10

Steve Jessop


file = 'data.json'
with open(file, 'rb') as f:
    content = json.load(file)

what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)}

I don't think there any other way. You must load the entire thing and only then you can extract the keys you want.

like image 32
DevLounge Avatar answered Oct 27 '22 18:10

DevLounge