I have this code:
import resource
from copy import deepcopy
import pandas as pd
import random
n=100000
df = pd.DataFrame({'x': [random.randint(1, 1000) for i in range(n)],
'y': [random.randint(1, 1000) for i in range(n)],
'z': [random.randint(1, 1000) for i in range(n)]
})
df2=deepcopy(df)
print 'memory peak: {kb} kb'.format(kb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
I expect to see different memory peak usages for this code with the line df2=deepcopy(df)
and without it. But the memory shows exactly the same result.
Isn't deepcopy
supposed to clone the object and therefore increase memory usage?
When calling copy.deepcopy
on an object foo
, its __dict__
is looked up for a __deepcopy__
method, that is called in turn.
In the case of a DataFrame
instance, it extends the NDFrame
class, whose __deepcopy__
method delegates the operation to NDFrame.copy
.
Here's the implementation of NDFrame.__deepcopy__
:
def __deepcopy__(self, memo=None):
"""
Parameters
----------
memo, default None
Standard signature. Unused
"""
if memo is None:
memo = {}
return self.copy(deep=True)
You can read NDFrame.copy
's docstring here, but here's the main excerpt:
def copy(self, deep=True):
"""
Make a copy of this object's indices and data.
When ``deep=True`` (default), a new object will be created with a
copy of the calling object's data and indices. Modifications to
the data or indices of the copy will not be reflected in the
original object (see notes below).
(...)
Notes
-----
When ``deep=True``, data is copied but actual Python objects
will not be copied recursively, only the reference to the object.
This is in contrast to `copy.deepcopy` in the Standard Library,
which recursively copies object data (see examples below).
In your example, the data is Python lists, that are therefore not actually copied.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With