I have a json file with about a 1000 data entries. For example
{"1":"Action","2":"Adventure",....."1000":"Mystery"}
The above is just a example.
I am using the json.load feature by importing json. How do I load only the first 10 data entries from the json.
{"1":"Action","2":"Adventure",....."10":"Thriller"}
Use the json.loads() function. The json. loads() function accepts as input a valid string and converts it to a Python dictionary. This process is called deserialization – the act of converting a string to an object.
It's pretty easy to load a JSON object in Python. Python has a built-in package called json, which can be used to work with JSON data. It's done by using the JSON module, which provides us with a lot of methods which among loads() and load() methods are gonna help us to read the JSON file.
loads() json. loads() method can be used to parse a valid JSON string and convert it into a Python Dictionary. It is mainly used for deserializing native string, byte, or byte array which consists of JSON data into Python Dictionary.
JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library json
module at any rate.
After loading, you could take the ten key-value pairs with the lowest key value:
import heapq
import json
data = json.loads(json_string)
limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)}
The heapq.nsmallest()
will efficiently pick out the 10 smallest keys regardless of the size of data
.
Of course, if the keys are always consecutive and always start at 1
, you may as well use a range()
here:
data = json.loads(json_string)
limited = {str(k): data[str(k)] for k in range(1, 11)}
If you want to capture the objects in file definition order you could use the object_pairs_hook
argument to json.load()
and json.loads()
:
class FirstTenDict(dict):
def __init__(self, pairs):
super(FirstTenDict, self).__init__(pairs[:10])
data = json.loads(json_string, object_pairs_hook=FirstTenDict)
Demo of the latter approach:
>>> import json
>>> class FirstTenDict(dict):
... def __init__(self, pairs):
... super(FirstTenDict, self).__init__(pairs[:10])
...
>>> json_data = '''\
... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs",
... "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle",
... "foo13": "will", "foo31": "be", "foo76": "ignored"}
... '''
>>> json.loads(json_data)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'}
>>> json.loads(json_data, object_pairs_hook=FirstTenDict)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'}
>>> import pprint
>>> pprint.pprint(_)
{'foo1': 'eggs',
'foo10': 'spam',
'foo11': 'eric',
'foo21': 'monty',
'foo24': 'vikings',
'foo31': 'baz',
'foo42': 'bar',
'foo44': 'ham',
'foo65': 'idle',
'foo88': 'python'}
You can iteratively parse json (that is to say, not "all at once") using ijson
, and assuming your input really is as simple as your example:
import ijson
def iter_items(parser):
for prefix, event, value in parser:
if event == 'string':
yield prefix, value
with open('filename.json') as infile:
items = iter_items(ijson.parser(infile))
# choose one of the following
# first 10 items from the file regardless of keys
print dict(itertools.islice(items, 10))
# least 10 keys when considered as integers
print dict(heapq.nsmallest(items, 10, lambda p: int(p[0])))
Obviously the second of these would still have to read the whole file, it just doesn't have to keep the whole file in memory at once. Avoiding that is premature optimization for only 1000 small key-value pairs, but whatever. I found the question interesting enough to use a library I've never considered before because maybe sometimes json files are huge, and because of the close analogy with SAX parsers (which are event-based streaming parsers for XML).
By the way, if order was important then the producer of this JSON probably should put an array in the JSON. But perhaps as consumer you can't do anything about that.
file = 'data.json'
with open(file, 'rb') as f:
content = json.load(file)
what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)}
I don't think there any other way. You must load the entire thing and only then you can extract the keys you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With