Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a fast pythonic way to deepcopy just data from a python dict or list ?

Tags:

When we need to copy full data from a dictionary containing primitive data types ( for simplicity, lets ignore presence of datatypes like datetime etc), the most obvious choice that we have is to use deepcopy, but deepcopy is slower than some other hackish methods of achieving the same i.e. using serialization-unserialization for example like json-dump-json-load or msgpack-pack-msgpack-unpack. The difference in efficiency can be seen here :

>>> import timeit >>> setup = ''' ... import msgpack ... import json ... from copy import deepcopy ... data = {'name':'John Doe','ranks':{'sports':13,'edu':34,'arts':45},'grade':5} ... ''' >>> print(timeit.timeit('deepcopy(data)', setup=setup)) 12.0860249996 >>> print(timeit.timeit('json.loads(json.dumps(data))', setup=setup)) 9.07182312012 >>> print(timeit.timeit('msgpack.unpackb(msgpack.packb(data))', setup=setup)) 1.42743492126 

json and msgpack (or cPickle) methods are faster than a normal deepcopy, which is obvious as deepcopy would be doing much more in copying all the attributes of the object too.

Question: Is there a more pythonic/inbuilt way to achieve just a data copy of a dictionary or list, without having all the overhead that deepcopy has ?

like image 849
DhruvPathak Avatar asked Aug 24 '17 09:08

DhruvPathak


People also ask

How fast is Deepcopy Python?

timeit says 41.8ms per deepcopy() . Alternatively to copying the state, you could create an action queue: Use the current state to determine the next actions and effects for all the objects without applying them right away, then apply all those actions in one batch, then calculate the actions for the next 'turn', etc.

How do you Deepcopy in Python?

To make a deep copy, use the deepcopy() function of the copy module. In a deep copy, copies are inserted instead of references to objects, so changing one does not change the other.

What is the difference between shallow copy and Deepcopy in Python?

A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original. A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

What is the difference between copy copy () and copy Deepcopy ()?

copy() create reference to original object. If you change copied object - you change the original object. . deepcopy() creates new object and does real copying of original object to new one. Changing new deepcopied object doesn't affect original object.


1 Answers

It really depends on your needs. deepcopy was built with the intention to do the (most) correct thing. It keeps shared references, it doesn't recurse into infinite recursive structures and so on... It can do that by keeping a memo dictionary in which all encountered "things" are inserted by reference. That's what makes it quite slow for pure-data copies. However I would almost always say that deepcopy is the most pythonic way to copy data even if other approaches could be faster.

If you have pure-data and a limited amount of types inside it you could build your own deepcopy (build roughly after the implementation of deepcopy in CPython):

_dispatcher = {}  def _copy_list(l, dispatch):     ret = l.copy()     for idx, item in enumerate(ret):         cp = dispatch.get(type(item))         if cp is not None:             ret[idx] = cp(item, dispatch)     return ret  def _copy_dict(d, dispatch):     ret = d.copy()     for key, value in ret.items():         cp = dispatch.get(type(value))         if cp is not None:             ret[key] = cp(value, dispatch)      return ret  _dispatcher[list] = _copy_list _dispatcher[dict] = _copy_dict  def deepcopy(sth):     cp = _dispatcher.get(type(sth))     if cp is None:         return sth     else:         return cp(sth, _dispatcher) 

This only works correct for all immutable non-container types and list and dict instances. You could add more dispatchers if you need them.

# Timings done on Python 3.5.3 - Windows - on a really slow laptop :-/  import copy import msgpack import json  import string  data = {'name':'John Doe','ranks':{'sports':13,'edu':34,'arts':45},'grade':5}  %timeit deepcopy(data) # 11.9 µs ± 280 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %timeit copy.deepcopy(data) # 64.3 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %timeit json.loads(json.dumps(data)) # 65.9 µs ± 2.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %timeit msgpack.unpackb(msgpack.packb(data)) # 56.5 µs ± 2.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 

Let's also see how it performs when copying a big dictionary containing strings and integers:

data = {''.join([a,b,c]): 1 for a in string.ascii_letters for b in string.ascii_letters for c in string.ascii_letters}  %timeit deepcopy(data) # 194 ms ± 5.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit copy.deepcopy(data) # 1.02 s ± 46.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit json.loads(json.dumps(data)) # 398 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit msgpack.unpackb(msgpack.packb(data)) # 238 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 
like image 179
MSeifert Avatar answered Sep 19 '22 14:09

MSeifert