Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance issues formatting using .json()

I am trying to load data from a file located at some URL. I use requests to get it (this happens plenty fast). However, it takes about 10 minutes to use r.json() to format part of the dictionary. How can I speed this up?

match_list = []
for i in range(1, 11):
    r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches%d.json' % i)
    print('matches %d of 10 loaded' % i)
    match_list.append(r.json()['matches'])
    print('list %d of 10 created' % i)
match_histories = {}
match_histories['matches'] = match_list

I know that there is a related question here: Performance problem transforming JSON data , but I don't see how I can apply that to my case. Thanks! (I'm using Python 3).

Edit:

I have been given quite a few suggestions that seem promising, but with each I hit a roadblock.

  • I would like to try cjson, but I cannot install it (pip can't find MS visual C++ 10.0, tried using some installation using Lua, but I need cl in my path to begin; ).

  • json.loads(r.content) causes a TypeError in Python 3.

  • I'm not sure how to get ijson working.

  • ujson seems to take about as long as json

  • json.loads(r.text.encode('utf-8').decode('utf-8')) takes just as long too

like image 284
Mark Avatar asked Jan 29 '26 22:01

Mark


2 Answers

The built-in JSON parser isn't particularly fast. I tried another parser, python-cjson, like so:

import requests
import cjson

r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches1.json')
print cjson.decode(r.content)

The whole program took 3.7 seconds on my laptop, including fetching the data and formatting the output for display.

Edit: Wow, we were all on the wrong track. json isn't slow; Requests's charset detection is painfully slow. Try this instead:

import requests
import json

r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches1.json')
r.encoding = 'UTF-8'
print json.loads(r.text)

The json.loads part takes 1.5s on my same laptop. That's still slower than cjson.decode (at only .62s), but may be fast enough that you won't care if this isn't something you run very frequently. Caveat: I've only benchmarked this on Python2 and it might be different on Python3.

Edit 2: It seems cjson doesn't install in Python3. That's OK: json.loads in this version only takes .54 seconds. Charset detection is still glacial, though, and commenting the r.encoding = 'UTF-8' still makes the test script run in O(eternal) time. If you can count on those files always being UTF-8 encoded, I think the performance secret is to put that information in your script so that it doesn't have to figure this out at runtime. With that boost, you don't need to bother with supplying your own JSON parser. Just run:

import requests

r = requests.get('https://s3-us-west-1.amazonaws.com/riot-api/seed_data/matches1.json')
r.encoding = 'UTF-8'
print r.json()
like image 150
Kirk Strauser Avatar answered Jan 31 '26 21:01

Kirk Strauser


It looks like requests uses simplejson to decode the JSON. If you just get the data with r.content and then use the builtin Python json library, json.loads(r.content) works very quickly. It works by raising an error for invalid JSON, but that's better than hanging for a long time.

like image 33
BrenBarn Avatar answered Jan 31 '26 22:01

BrenBarn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!