I'm working through some scipy lectures (http://scipy-lectures.github.io/intro/language/standard_library.html#pickle-easy-persistence) and I came across this statement about Pickle:
Useful to store arbitrary objects to a file. Not safe or fast!
What do they mean by this? Not safe (according to Pickle docs) as in don't UnPickle files from an unknown origin or not safe as in you don't always retrieve the original object?
What's the alternative for something safer and faster? I know about cPickle being faster, but I don't think it solves the above definition of safer.
Thanks.
Pickle is slow Pickle is both slower and produces larger serialized values than most of the alternatives. Pickle is the clear underperformer here. Even the 'cPickle' extension that's written in C has a serialization rate that's about a quarter that of JSON or Thrift.
Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions. However, this is also gives it the power to serialize almost any Python object, without any boilerplate or even white-/black-listing (in the common case).
The insecurity is not because pickles contain code, but because they create objects by calling constructors named in the pickle. Any callable can be used in place of your class name to construct objects. Malicious pickles will use other Python callables as the “constructors.” For example, instead of executing “models.
JSON is a lightweight format and is much faster than Pickling. There is always a security risk with Pickle. Unpickling data from unknown sources should be avoided as it may contain malicious or erroneous data. There are no loopholes in security using JSON, and it is free from security threats.
Using pickle in production code is vulnerable by design. Arbitrary code can be executed while unpickling. You can safely unpickle only data from trusted sources. Never unpickle data received from an untrusted or unauthenticated source.
See here for real applications samples.
As for faster alternative, there is marshal
, python internal serealization library. But unlike pickle (or cPickle, which is just a C implementation), it is less stable (see docs) and its output being architecture and os independend, depends on python version. That is object marshal'ed on Windows platform with python 2.7.5 is guaranteed to be un-marshalable on OS X or Ubuntu with python 2.7.5 installed, but not guaranteed to be un-marshalable with python 2.6 on Windows.
Another faster, safer by design, but less functional serialization alternative is JSON
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With