Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2.6 JSON decoding performance

I'm using the json module in Python 2.6 to load and decode JSON files. However I'm currently getting slower than expected performance. I'm using a test case which is 6MB in size and json.loads() is taking 20 seconds.

I thought the json module had some native code to speed up the decoding?

How do I check if this is being used?

As a comparison, I downloaded and installed the python-cjson module, and cjson.decode() is taking 1 second for the same test case.

I'd rather use the JSON module provided with Python 2.6 so that users of my code aren't required to install additional modules.

(I'm developing on Mac OS X, but I getting a similar result on Windows XP.)

like image 928
James Austin Avatar asked Apr 01 '09 15:04

James Austin


People also ask

Is JSON slow in Python?

Standard Python JSON parser ( json. load() etc.) is relatively slow, and if you need to parse large JSON files or a large number of small JSON files, it may represent a significant bottleneck.

How do you parse JSON output in Python?

Python has a built in module that allows you to work with JSON data. At the top of your file, you will need to import the json module. If you need to parse a JSON string that returns a dictionary, then you can use the json. loads() method.

What is JSON dump?

The json. dumps() method allows us to convert a python object into an equivalent JSON object. Or in other words to send the data from python to json. The json. dump() method allows us to convert a python object into an equivalent JSON object and store the result into a JSON file at the working directory.

What is fp in JSON dump?

A fp is a file pointer used to write JSON formatted data into file. Python json module always produces string objects, not bytes objects, therefore, fp.


3 Answers

The new Yajl - Yet Another JSON Library is very fast.

yajl        serialize: 0.180  deserialize: 0.182  total: 0.362 simplejson  serialize: 0.840  deserialize: 0.490  total: 1.331 stdlib json serialize: 2.812  deserialize: 8.725  total: 11.537 

You can compare the libraries yourself.

Update: UltraJSON is even faster.

like image 81
Ivo Danihelka Avatar answered Oct 07 '22 00:10

Ivo Danihelka


It may vary by platform, but the builtin json module is based on simplejson, not including the C speedups. I've found simplejson to be as a fast as python-cjson anyway, so I prefer it since it obviously has the same interface as the builtin.

try:     import simplejson as json except ImportError:     import json 

Seems to me that's the best idiom for awhile, yielding the performance when available while being forwards-compatible.

like image 20
A. Coady Avatar answered Oct 07 '22 01:10

A. Coady


I was parsing the same file 10x. File size was 1,856,944 bytes.

Python 2.6:

yajl        serialize: 0.294  deserialize: 0.334  total: 0.627
cjson       serialize: 0.494  deserialize: 0.276  total: 0.769
simplejson  serialize: 0.554  deserialize: 0.268  total: 0.823
stdlib json serialize: 3.917  deserialize: 17.508 total: 21.425

Python 2.7:

yajl        serialize: 0.289  deserialize: 0.312  total: 0.601
cjson       serialize: 0.232  deserialize: 0.254  total: 0.486
simplejson  serialize: 0.288  deserialize: 0.253  total: 0.540
stdlib json serialize: 0.273  deserialize: 0.256  total: 0.528

Not sure why numbers are disproportionate from your results. I guess, newer libraries?

like image 28
Tomas Avatar answered Oct 07 '22 01:10

Tomas