Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle alternatives

I am trying to serialize a large (~10**6 rows, each with ~20 values) list, to be used later by myself (so pickle's lack of safety isn't a concern).

Each row of the list is a tuple of values, derived from some SQL database. So far, I have seen datetime.datetime, strings, integers, and NoneType, but I might eventually have to support additional data types.

For serialization, I've considered pickle (cPickle), json, and plain text - but only pickle saves the type information: json can't serialize datetime.datetime, and plain text has its obvious disadvantages.

However, cPickle is pretty slow for data this large, and I'm looking for a faster alternative.

like image 666
Guy Adini Avatar asked Mar 27 '12 20:03

Guy Adini


People also ask

What is the difference between pickle and Joblib?

Joblib is the replacement of pickle as it is more efficient on objects that carry large numpy arrays. These functions also accept file-like object instead of filenames.

Are pickles faster than JSON?

JSON is a lightweight format and is much faster than Pickling. There is always a security risk with Pickle. Unpickling data from unknown sources should be avoided as it may contain malicious or erroneous data. There are no loopholes in security using JSON, and it is free from security threats.

What is the difference between pickle and cPickle?

The pickle data format is standardized, so strings serialized with pickle can be deserialized with cPickle and vice versa. The main difference between cPickle and pickle is performance. The cPickle module is many times faster to execute because it's written in C and because its methods are functions instead of classes.

Are Python pickles efficient?

The advantage of using pickle is that it can serialize pretty much any Python object, without having to add any extra code. Its also smart in that in will only write out any single object once, making it effective to store recursive structures like graphs.


1 Answers

Pickle is actually quite fast so long as you aren't using the (default) ASCII protocol. Just make sure to dump using protocol=pickle.HIGHEST_PROTOCOL.

like image 147
Jake Biesinger Avatar answered Sep 18 '22 13:09

Jake Biesinger