Relationship between pickle and deepcopy

Tags:

What exactly is the relationship between pickle and copy.deepcopy? What mechanisms do they share, and how?

It is clear the two are closely-related operations, and share some of the mechanisms/protocols, but I can't wrap my head around the details.

Some (confusing) things I found out:

If a class defines __[gs]etstate__, they get called upon a deepcopy of its instances. This surprised me at first, because I thought they are specific to pickle, but then I found that Classes can use the same interfaces to control copying that they use to control pickling. However, there's no documentation of how __[gs]etstate__ is used when deepcopying (how the value returned from __getstate__ is used, what is being passed to __setstate__?)
A naive alternative implementation of deepcopy would be pickle.loads(pickle.dumps(obj)). However, this can't possibly be equivalent to deepcopy'ing, because if a class defines a __deepcopy__ operation, it would not be invoked using this pickle-based implementation of deepcopy. (I also stumbled upon a statement that deepcopy is more general than pickle, and there are many types which are deepcopyable, but not pickleable.)

(1) indicates a commonality, while (2) indicates a difference between pickle and deepcopy.

On top of that, I found these two contradictory statements:

copy_reg: The pickle, cPickle, and copy modules use those functions when pickling/copying those objects

and

The copy module does not use the copy_reg registration module

This, on one hand, is another indication of a relationship/commonality between pickle and deepcopy, and on the other hand, contributes to the my confusion...

[My experience is with python2.7, but I'd also appreciate any pointers regarding the differences in pickle/deepcopy between python2 and python3]

400

asked Mar 13 '14 19:03

shx2

2 Answers

You should not be confused by (1) and (2). In general, Python tries to include sensible fall-backs for missing methods. (For instance, it is enough to define __getitem__ in order to have an iterable class, but it may be more efficient to also implement __iter__. Similar for operations like __add__, with optional __iadd__ etc.)

__deepcopy__ is the most specialized method that deepcopy() will look for, but if it does not exists, falling back to the pickle protocol is a sensible thing to do. It does not really call dumps()/loads(), because it does not rely on the intermediate representation to be a string, but it will indirectly make use of __getstate__ and __setstate__ (via __reduce__), as you observed.

Currently, the documentation still states

… The copy module does not use the copy_reg registration module.

but that seems to be a bug that has been fixed in the meantime (possibly, the 2.7 branch has not gotten enough attention here).

Also note that this is pretty deeply integrated into Python (at least nowadays); the object class itself implements __reduce__ (and its versioned _ex variant), which refers to copy_reg.__newobj__ for creating fresh instances of the given object-derived class.

answered Sep 27 '22 19:09

hans_meine

Ok, I had to read the source code for this one, but it looks like it's a pretty simple answer. http://svn.python.org/projects/python/trunk/Lib/copy.py

copy looks up some of the builtin types it knows what the constructors look like for (registered in the _copy_dispatch dictionary, and when it doesn't know how to copy the basic type, it imports copy_reg.dispatch_table... which is the place where pickle registers the methods it knows for producing new copies of objects. Essentially, it's a dictionary of the type of object and the "function to produce a new object" -- this "function to produce a new object" is pretty much what you write when you write a __reduce__ or a __reduce_ex__ method for an object (and if one of those is missing or needs help, it defers to the __setstate__, __getstate__, etc methods.

So that's copy. Basically… (with some additional clauses…)

def copy(x):     """Shallow copy operation on arbitrary Python objects.      See the module's __doc__ string for more info.     """      cls = type(x)      copier = _copy_dispatch.get(cls)     if copier:         return copier(x)      copier = getattr(cls, "__copy__", None)     if copier:         return copier(x)      reductor = dispatch_table.get(cls)     if reductor:         rv = reductor(x)     else:         reductor = getattr(x, "__reduce_ex__", None)         if reductor:             rv = reductor(2)         else:             reductor = getattr(x, "__reduce__", None)             if reductor:                 rv = reductor()             else:                 raise Error("un(shallow)copyable object of type %s" % cls)

deepcopy does the same thing as the above, but in addition inspects each object and makes sure that there's a copy for each new object and not a pointer reference. deepcopy builds it's own _deepcopy_dispatch table (a dict) where it registers functions that ensure the new objects produced do not have pointer references to the originals (possibly generated with the __reduce__ functions registered in copy_reg.dispatch_table)

Hence writing a __reduce__ method (or similar) and registering it with copy_reg, should enable copy and deepcopy to do their thing as well.

answered Sep 27 '22 19:09

Mike McKerns

Related questions
                            
                                How should I unit test a code-generator?
                            
                                what url should I authorize to use pip behind a firewall?
                            
                                Environment Variables when python script run by cron
                            
                                PIP 10.0.1 - Warning "Consider adding this directory to PATH or..."
                            
                                BC dates in Python
                            
                                Dihedral/Torsion Angle From Four Points in Cartesian Coordinates in Python
                            
                                Using a Pandas dataframe index as values for x-axis in matplotlib plot
                            
                                List PyPI packages by popularity [closed]
                            
                                Proper way to shutdown asyncio tasks
                            
                                Why is foo(*arg, x) not allowed in Python?
                            
                                Python 3.7: check if type annotation is "subclass" of generic
                            
                                How do you set up a Flask application with SQLAlchemy for testing?
                            
                                Python pattern for sharing configuration throughout application
                            
                                Globals variables and Python multiprocessing [duplicate]
                            
                                Boto3 updating multiple values
                            
                                Using shorter textwidth in comments and docstrings
                            
                                Python: Making numpy default to float32
                            
                                Is there a Python equivalent for C++ "multiset<int>"?
                            
                                running nose --with-coverage to get all the package files, but not other dependencies and libs
                            
                                Does Python always raise an exception if you do Ctrl+C when a subprocess is executing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Relationship between pickle and deepcopy

Tags:

python

python-2.x

deep-copy

pickle

shx2

People also ask

2 Answers

hans_meine

Mike McKerns

Recent Activity

Donate For Us