The pickle module documentation says right at the beginning:
Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
However, further down under restricting globals it seems to describe a way to make unpickling data safe using a whitelist of allowed objects.
Does this mean that I can safely unpickle untrusted data if I use a RestrictedUnpickler
that allows only some "elementary" types, or are there additional security issues that are not addressed by this method? If there are, is there another way to make unpickling safe (obviously at the cost of not being able to unpickle every stream)?
With "elementary types" I mean precisely the following:
bool
str
, bytes
, bytearray
int
, float
, complex
tuple
, list
, dict
, set
and frozenset
With pickle protocol v1, you cannot pickle open file objects, network connections, or database connections.
Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk.
Pickling, especially unpickling, is not thread-safe.
Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.
In this answer we're going to explore what exactly the pickle protocol allows an attacker to do. This means we're only going to rely on documented features of the protocol, not implementation details (with a few exceptions). In other words, we'll assume that the source code of the pickle
module is correct and bug-free and allows us to do exactly what the documentation says and nothing more.
Pickle allows classes to customize how their instances are pickled. During the unpickling process, we can:
__setstate__
method (as long as we manage to unpickle an instance of that class).__reduce__
method (as long as we can gain access to the callable somehow).append
, extend
and __setitem__
methods, once again thanks to __reduce__
.Unpickler.find_class
allows us to.str
, bytes
, list
, tuple
, dict
, int
, float
, bool
. This is not documented, but these types are built into the protocol itself and don't go through Unpickler.find_class
.The most useful (from an attacker's perspective) feature here is the ability to invoke callables. If they can access exec
or eval
, they can make us execute arbitrary code. If they can access os.system
or subprocess.Popen
they can run arbitrary shell commands. Of course, we can deny them access to these with Unpickler.find_class
. But how exactly should we implement our find_class
method? Which functions and classes are safe, and which are dangerous?
Here I'll try to explain some methods an attacker can use to do evil things. Giving an attacker access to any of these functions/classes means you're in danger.
exec
and eval
(duh)os.system
, os.popen
, subprocess.Popen
and all other subprocess
functionstypes.FunctionType
, which allows to create a function from a code object (can be created with compile
or types.CodeType
)typing.get_type_hints
. Yes, you read that right. How, you ask? Well, typing.get_type_hints
evaluates forward references. So all you need is an object with __annotations__
like {'x': 'os.system("rm -rf /")'}
and get_type_hints
will run the code for you.functools.singledispatch
. I see you shaking your head in disbelief, but it's true. Single-dispatch functions have a register
method, which internally calls typing.get_type_hints
.Accessing things without going through Unpickler.find_class
:
Just because our find_class
method prevents an attacker from accessing something directly doesn't mean there's no indirect way of accessing that thing.
obj.__class__
, a class's parents can be accessed as cls.__bases__
, etc.
getattr
operator.attrgetter
object.__getattribute__
Tools.scripts.find_recursionlimit.RecursiveBlowup5.__getattr__
Indexing: Lots of things are stored in lists, tuples and dicts - being able to index data structures opens many doors for an attacker.
operator.itemgetter
list.__getitem__
, dict.__getitem__
, etc
See Ned Batchelder's Eval is really dangerous to find out how an attacker can use these to gain access to pretty much everything.
Code execution after unpickling:
An attacker doesn't necessarily have to do something dangerous during the unpickling process - they can also try to return a dangerous object and let you call a dangerous function on accident. Maybe you call typing.get_type_hints
on the unpickled object, or maybe you expect to unpickle a CuteBunny
but instead unpickle a FerociousDragon
and get your hand bitten off when you try to .pet()
it. Always make sure the unpickled object is of the type you expect, its attributes are of the types you expect, and it doesn't have any attributes you don't expect it to have.
At this point, it should be obvious that there aren't many modules/classes/functions you can trust. When you implement your find_class
method, never ever write a blacklist - always write a whitelist, and only include things you're sure can't be abused.
If you really only allow access to bool
, str
, bytes
, bytearray
, int
, float
, complex
, tuple
, list
, dict
, set
and frozenset
then you're most likely safe. But let's be honest - you should probably use JSON instead.
In general, I think most classes are safe - with exceptions like subprocess.Popen
, of course. The worst thing an attacker can do is call the class - which generally shouldn't do anything more dangerous than return an instance of that class.
What you really need to be careful about is allowing access to functions (and other non-class callables), and how you handle the unpickled object.
I'd go so far as saying that there is no safe way to use pickle to handle untrusted data.
Even with restricted globals, the dynamic nature of Python is such that a determined hacker still has a chance of finding a way back to the __builtins__
mapping and from there to the Crown Jewels.
See Ned Batchelder's blog posts on circumventing restrictions on eval()
that apply in equal measure to pickle
.
Remember that pickle
is still a stack language and you cannot foresee all possible objects produced from allowing arbitrary calls even to a limited set of globals. The pickle documentation also doesn't mention the EXT*
opcodes that allow calling copyreg
-installed extensions; you'll have to account for anything installed in that registry too here. All it takes is one vector allowing a object call to be turned into a getattr
equivalent for your defences to crumble.
At the very least use a cryptographic signature to your data so you can validate the integrity. You'll limit the risks, but if an attacker ever managed to steal your signing secrets (keys) then they could again slip you a hacked pickle.
I would instead use an an existing innocuous format like JSON and add type annotations; e.g. store data in dictionaries with a type key and convert when loading the data.
This idea has been discussed also on the mailing list python-ideas when addressing the problem of adding a safe pickle
alternative in the standard library. For example here:
To make it safer I would have a restricted unpickler as the default (for load/loads) and force people to override it if they want to loosen restrictions. To be really explicit, I would make load/loads only work with built-in types.
And also here:
I've always wanted a version of pickle.loads() that takes a list of classes that are allowed to be instantiated.
Is the following enough for you: http://docs.python.org/3.4/library/pickle.html#restricting-globals ?
Indeed, it is. Thanks for pointing it out! I've never gotten past the module interface part of the docs. Maybe the warning at the top of the page could also mention that there are ways to mitigate the safety concerns, and point to #restricting-globals?
Yes, that would be a good idea :-)
So I don't know why the documentation has not been changed but according to me, using a RestrictedUnpickler
to restrict the types that can be unpickled is a safe solution. Of course there could exist bugs in the library that compromise the system, but there could be a bug also in OpenSSL that show random memory data to everyone who asks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With