I have a json file with about a 1000 data entries. For example <pre class="prettyprint"><code>{"1":"Action","2":"Adventure",....."1000":"Mystery"} </code></pre> The above is just a example. I am using the json.load feature by importing json. How do I load only the first 10 data entries from the json. <pre class="prettyprint"><code>{"1":"Action","2":"Adventure",....."10":"Thriller"} </code></pre>

JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library <code>json</code> module at any rate. After loading, you could take the ten key-value pairs with the lowest key value: <pre class="prettyprint"><code>import heapq import json data = json.loads(json_string) limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)} </code></pre> The <code>heapq.nsmallest()</code> will efficiently pick out the 10 smallest keys regardless of the size of <code>data</code>. Of course, if the keys are always consecutive and always start at <code>1</code>, you may as well use a <code>range()</code> here: <pre class="prettyprint"><code>data = json.loads(json_string) limited = {str(k): data[str(k)] for k in range(1, 11)} </code></pre> If you want to capture the objects in file definition order you could use the <code>object_pairs_hook</code> argument to <code>json.load()</code> and <code>json.loads()</code>: <pre class="prettyprint"><code>class FirstTenDict(dict): def __init__(self, pairs): super(FirstTenDict, self).__init__(pairs[:10]) data = json.loads(json_string, object_pairs_hook=FirstTenDict) </code></pre> Demo of the latter approach: <pre class="prettyprint"><code>>>> import json >>> class FirstTenDict(dict): ... def __init__(self, pairs): ... super(FirstTenDict, self).__init__(pairs[:10]) ... >>> json_data = '''\ ... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs", ... "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle", ... "foo13": "will", "foo31": "be", "foo76": "ignored"} ... ''' >>> json.loads(json_data) {'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'} >>> json.loads(json_data, object_pairs_hook=FirstTenDict) {'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'} >>> import pprint >>> pprint.pprint(_) {'foo1': 'eggs', 'foo10': 'spam', 'foo11': 'eric', 'foo21': 'monty', 'foo24': 'vikings', 'foo31': 'baz', 'foo42': 'bar', 'foo44': 'ham', 'foo65': 'idle', 'foo88': 'python'} </code></pre>

<pre class="prettyprint"><code>file = 'data.json' with open(file, 'rb') as f: content = json.load(file) what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)} </code></pre> I don't think there any other way. You must load the entire thing and only then you can extract the keys you want.

Load part of a json in python

Tags:

python

I have a json file with about a 1000 data entries. For example

{"1":"Action","2":"Adventure",....."1000":"Mystery"}

The above is just a example.

I am using the json.load feature by importing json. How do I load only the first 10 data entries from the json.

{"1":"Action","2":"Adventure",....."10":"Thriller"}

889

asked Sep 18 '15 21:09

cann0nextr3me

3 Answers

JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library json module at any rate.

After loading, you could take the ten key-value pairs with the lowest key value:

import heapq
import json

data = json.loads(json_string)
limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)}

The heapq.nsmallest() will efficiently pick out the 10 smallest keys regardless of the size of data.

Of course, if the keys are always consecutive and always start at 1, you may as well use a range() here:

data = json.loads(json_string)
limited = {str(k): data[str(k)] for k in range(1, 11)}

If you want to capture the objects in file definition order you could use the object_pairs_hook argument to json.load() and json.loads():

class FirstTenDict(dict):
    def __init__(self, pairs):
        super(FirstTenDict, self).__init__(pairs[:10])

data = json.loads(json_string, object_pairs_hook=FirstTenDict)

Demo of the latter approach:

>>> import json
>>> class FirstTenDict(dict):
...     def __init__(self, pairs):
...         super(FirstTenDict, self).__init__(pairs[:10])
... 
>>> json_data = '''\
... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs",
...  "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle",
...  "foo13": "will", "foo31": "be", "foo76": "ignored"}
... '''
>>> json.loads(json_data)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'}
>>> json.loads(json_data, object_pairs_hook=FirstTenDict)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'}
>>> import pprint
>>> pprint.pprint(_)
{'foo1': 'eggs',
 'foo10': 'spam',
 'foo11': 'eric',
 'foo21': 'monty',
 'foo24': 'vikings',
 'foo31': 'baz',
 'foo42': 'bar',
 'foo44': 'ham',
 'foo65': 'idle',
 'foo88': 'python'}

192

answered Oct 27 '22 16:10

Martijn Pieters

You can iteratively parse json (that is to say, not "all at once") using ijson, and assuming your input really is as simple as your example:

import ijson

def iter_items(parser):
    for prefix, event, value in parser:
        if event == 'string':
            yield prefix, value

with open('filename.json') as infile:
    items = iter_items(ijson.parser(infile))
    # choose one of the following
    # first 10 items from the file regardless of keys
    print dict(itertools.islice(items, 10))
    # least 10 keys when considered as integers
    print dict(heapq.nsmallest(items, 10, lambda p: int(p[0])))

Obviously the second of these would still have to read the whole file, it just doesn't have to keep the whole file in memory at once. Avoiding that is premature optimization for only 1000 small key-value pairs, but whatever. I found the question interesting enough to use a library I've never considered before because maybe sometimes json files are huge, and because of the close analogy with SAX parsers (which are event-based streaming parsers for XML).

By the way, if order was important then the producer of this JSON probably should put an array in the JSON. But perhaps as consumer you can't do anything about that.

answered Oct 27 '22 18:10

Steve Jessop

file = 'data.json'
with open(file, 'rb') as f:
    content = json.load(file)

what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)}

I don't think there any other way. You must load the entire thing and only then you can extract the keys you want.

answered Oct 27 '22 18:10

DevLounge

Related questions
                            
                                how to get derivatives from 1D interpolation
                            
                                How do I pass arguments to a Python script with IronPython
                            
                                Filter special chars such as color codes from shell output
                            
                                Does scikit-learn perform "real" multivariate regression (multiple dependent variables)?
                            
                                Run and execute a python script from VBA
                            
                                Pygame draw anti-aliased thick line
                            
                                Get inner text from lxml
                            
                                Python Sphinx anchor on arbitrary line
                            
                                Django, How to make multiple annotate in a single queryset
                            
                                How to close a QDialog
                            
                                Unable to Include Jinja2 Template to Pyinstaller Distribution
                            
                                How to get apache to serve static files on Flask webapp
                            
                                Find maximum of each row in a numpy array and the corresponding element in another array of the same size
                            
                                How to save / serialize a trained model in theano?
                            
                                Get value of a form input by ID python/flask
                            
                                How to run a command only if is the master branch in travis-ci?
                            
                                linear regression for timeseries python (numpy or pandas)
                            
                                How to annotate seaborn pairplots?
                            
                                Why is adding to or removing from the middle of a collections.deque slower than lookup there?
                            
                                How to customize a scatter matrix to see all titles?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With