copy.deepcopy vs pickle

Tags:

I have a tree structure of widgets e.g. collection contains models and model contains widgets. I want to copy whole collection, copy.deepcopy is faster in comparison to 'pickle and de-pickle'ing the object but cPickle as being written in C is much faster, so

Why shouldn't I(we) always be using cPickle instead of deepcopy?
Is there any other copy alternative? because pickle is slower then deepcopy but cPickle is faster, so may be a C implementation of deepcopy will be the winner

Sample test code:

import copy import pickle import cPickle  class A(object): pass  d = {} for i in range(1000):     d[i] = A()  def copy1():     return copy.deepcopy(d)  def copy2():     return pickle.loads(pickle.dumps(d, -1))  def copy3():     return cPickle.loads(cPickle.dumps(d, -1))

Timings:

>python -m timeit -s "import c" "c.copy1()" 10 loops, best of 3: 46.3 msec per loop  >python -m timeit -s "import c" "c.copy2()" 10 loops, best of 3: 93.3 msec per loop  >python -m timeit -s "import c" "c.copy3()" 100 loops, best of 3: 17.1 msec per loop

960

asked Sep 11 '09 12:09

Anurag Uniyal

2 Answers

Problem is, pickle+unpickle can be faster (in the C implementation) because it's less general than deepcopy: many objects can be deepcopied but not pickled. Suppose for example that your class A were changed to...:

class A(object):   class B(object): pass   def __init__(self): self.b = self.B()

now, copy1 still works fine (A's complexity slows it downs but absolutely doesn't stop it); copy2 and copy3 break, the end of the stack trace says...:

  File "./c.py", line 20, in copy3     return cPickle.loads(cPickle.dumps(d, -1)) PicklingError: Can't pickle <class 'c.B'>: attribute lookup c.B failed

I.e., pickling always assumes that classes and functions are top-level entities in their modules, and so pickles them "by name" -- deepcopying makes absolutely no such assumptions.

So if you have a situation where speed of "somewhat deep-copying" is absolutely crucial, every millisecond matters, AND you want to take advantage of special limitations that you KNOW apply to the objects you're duplicating, such as those that make pickling applicable, or ones favoring other forms yet of serializations and other shortcuts, by all means go ahead - but if you do you MUST be aware that you're constraining your system to live by those limitations forevermore, and document that design decision very clearly and explicitly for the benefit of future maintainers.

For the NORMAL case, where you want generality, use deepcopy!-)

answered Sep 22 '22 08:09

Alex Martelli

You should be using deepcopy because it makes your code more readable. Using a serialization mechanism to copy objects in memory is at the very least confusing to another developer reading your code. Using deepcopy also means you get to reap the benefits of future optimizations in deepcopy.

First rule of optimization: don't.

answered Sep 26 '22 08:09

wds

Related questions
                            
                                convert binary string to numpy array
                            
                                TypeError: only integer arrays with one element can be converted to an index
                            
                                How to create a copy of a python function [duplicate]
                            
                                Extract bounding box and save it as an image
                            
                                Open tor browser with selenium
                            
                                PEP 257 docstring trim in standard library?
                            
                                How to avoid floating point errors? [duplicate]
                            
                                Python's in (__contains__) operator returns a bool whose value is neither True nor False
                            
                                Pandas merge giving error "Buffer has wrong number of dimensions (expected 1, got 2)"
                            
                                Read excel sheet with multiple header using Pandas
                            
                                Do Python lambda functions help in reducing the execution times?
                            
                                Renaming file extension using pathlib (python 3)
                            
                                Easiest way to serialize a simple class object with simplejson?
                            
                                how to convert Python 3 to Python 2 code? [closed]
                            
                                Unwanted RST TCP packet with Scapy
                            
                                Changing variable name in Spyder
                            
                                PermissionError: [WinError 32] The process cannot access the file because it is being used by another process
                            
                                numpy testing assert array NOT equal
                            
                                Where are Pip installation logs?
                            
                                Add class to Django label_tag() output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

copy.deepcopy vs pickle

Tags:

python

deep-copy

pickle

Anurag Uniyal

People also ask

2 Answers

Alex Martelli

wds

Recent Activity

Donate For Us