Performance issues formatting using .json()

Question

I am trying to load data from a file located at some URL. I use requests to get it (this happens plenty fast). However, it takes about 10 minutes to use r.json() to format part of the dictionary. How can I speed this up?

match_list = []
for i in range(1, 11):
    r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches%d.json' % i)
    print('matches %d of 10 loaded' % i)
    match_list.append(r.json()['matches'])
    print('list %d of 10 created' % i)
match_histories = {}
match_histories['matches'] = match_list

I know that there is a related question here: Performance problem transforming JSON data , but I don't see how I can apply that to my case. Thanks! (I'm using Python 3).

Edit:

I have been given quite a few suggestions that seem promising, but with each I hit a roadblock.

I would like to try cjson, but I cannot install it (pip can't find MS visual C++ 10.0, tried using some installation using Lua, but I need cl in my path to begin; ).
json.loads(r.content) causes a TypeError in Python 3.
I'm not sure how to get ijson working.
ujson seems to take about as long as json
json.loads(r.text.encode('utf-8').decode('utf-8')) takes just as long too

Kirk Strauser · Accepted Answer

The built-in JSON parser isn't particularly fast. I tried another parser, python-cjson, like so:

import requests
import cjson

r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches1.json')
print cjson.decode(r.content)

The whole program took 3.7 seconds on my laptop, including fetching the data and formatting the output for display.

Edit: Wow, we were all on the wrong track. json isn't slow; Requests's charset detection is painfully slow. Try this instead:

import requests
import json

r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches1.json')
r.encoding = 'UTF-8'
print json.loads(r.text)

The json.loads part takes 1.5s on my same laptop. That's still slower than cjson.decode (at only .62s), but may be fast enough that you won't care if this isn't something you run very frequently. Caveat: I've only benchmarked this on Python2 and it might be different on Python3.

Edit 2: It seems cjson doesn't install in Python3. That's OK: json.loads in this version only takes .54 seconds. Charset detection is still glacial, though, and commenting the r.encoding = 'UTF-8' still makes the test script run in O(eternal) time. If you can count on those files always being UTF-8 encoded, I think the performance secret is to put that information in your script so that it doesn't have to figure this out at runtime. With that boost, you don't need to bother with supplying your own JSON parser. Just run:

import requests

r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches1.json')
r.encoding = 'UTF-8'
print r.json()

BrenBarn · Answer

It looks like requests uses simplejson to decode the JSON. If you just get the data with r.content and then use the builtin Python json library, json.loads(r.content) works very quickly. It works by raising an error for invalid JSON, but that's better than hanging for a long time.

Performance issues formatting using .json()

Tags:

python

json

python-3.x

Mark

2 Answers

Kirk Strauser

BrenBarn

Recent Activity

Donate For Us

Performance issues formatting using .json()

Tags:

python

json

python-3.x

Mark

2 Answers

Kirk Strauser

BrenBarn

Related questions

Recent Activity

Donate For Us