Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary? [closed]

What is faster:

(A) 'Unpickling' (Loading) a pickled dictionary object, using pickle.load()

or

(B) Loading a JSON file to a dictionary using simplejson.load()

Assuming: The pickled object file exists already in case A, and that the JSON file exists already in case B.

like image 596
Pranjal Mittal Avatar asked Aug 29 '13 18:08

Pranjal Mittal


People also ask

Which is faster pickle or JSON?

JSON is a lightweight format and is much faster than Pickling. There is always a security risk with Pickle. Unpickling data from unknown sources should be avoided as it may contain malicious or erroneous data. There are no loopholes in security using JSON, and it is free from security threats.

How long does it take to load a pickle file?

Loading this pickle takes around 15 seconds. How can I reduce this time? The code bellow shows how much time takes to dump or load a file using json, pickle, cPickle. After dumping, file size would be around 300MB.

Are pickle files fast?

Pickle is slow Pickle is both slower and produces larger serialized values than most of the alternatives. Pickle is the clear underperformer here. Even the 'cPickle' extension that's written in C has a serialization rate that's about a quarter that of JSON or Thrift.

What is better than pickle Python?

An alternative is cPickle. It is nearly identical to pickle , but written in C, which makes it up to 1000 times faster. For small files, however, you won't notice the difference in speed. Both produce the same data streams, which means that Pickle and cPickle can use the same files.


1 Answers

The speed actually depends on the data, it's content and size.

But, anyway, let's take an example json data and see what is faster (Ubuntu 12.04, python 2.7.3) :

  • pickle
  • cPickle
  • json
  • simplejson
  • ujson
  • yajl

Giving this json structure dumped into test.json and test.pickle files:

{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

Testing script:

import timeit

import pickle
import cPickle

import json
import simplejson
import ujson
import yajl


def load_pickle(f):
    return pickle.load(f)


def load_cpickle(f):
    return cPickle.load(f)


def load_json(f):
    return json.load(f)


def load_simplejson(f):
    return simplejson.load(f)


def load_ujson(f):
    return ujson.load(f)


def load_yajl(f):
    return yajl.load(f)


print "pickle:"
print timeit.Timer('load_pickle(open("test.pickle"))', 'from __main__ import load_pickle').timeit()

print "cpickle:"
print timeit.Timer('load_cpickle(open("test.pickle"))', 'from __main__ import load_cpickle').timeit()

print "json:"
print timeit.Timer('load_json(open("test.json"))', 'from __main__ import load_json').timeit()

print "simplejson:"
print timeit.Timer('load_simplejson(open("test.json"))', 'from __main__ import load_simplejson').timeit()

print "ujson:"
print timeit.Timer('load_ujson(open("test.json"))', 'from __main__ import load_ujson').timeit()

print "yajl:"
print timeit.Timer('load_yajl(open("test.json"))', 'from __main__ import load_yajl').timeit()

Output:

pickle:
107.936687946

cpickle:
28.4231381416

json:
31.6450419426

simplejson:
20.5853149891

ujson:
16.9352178574

yajl:
18.9763481617

As you can see, unpickling via pickle is not that fast at all - cPickle is definetely the way to go if you choose pickling/unpickling option. ujson looks promising among these json parsers on this particular data.

Also, json and simplejson libraries load much faster on pypy (see Python JSON Performance).

See also:

  • Python JSON decoding performance
  • Pickle or json?
  • Pickle vs JSON — Which is Faster?

It's important to note that the results may differ on your particular system, on other type and size of data.

like image 191
alecxe Avatar answered Sep 30 '22 19:09

alecxe