Why doesn't deepcopy of a Pandas DataFrame affect memory usage?

Question

I have this code:

import resource
from copy import deepcopy

import pandas as pd
import random

n=100000
df = pd.DataFrame({'x': [random.randint(1, 1000) for i in range(n)],
                   'y': [random.randint(1, 1000) for i in range(n)],
                   'z': [random.randint(1, 1000) for i in range(n)]
                   })
df2=deepcopy(df)
print 'memory peak: {kb} kb'.format(kb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

I expect to see different memory peak usages for this code with the line df2=deepcopy(df) and without it. But the memory shows exactly the same result. Isn't deepcopy supposed to clone the object and therefore increase memory usage?

Right leg · Accepted Answer

When calling copy.deepcopy on an object foo, its __dict__ is looked up for a __deepcopy__ method, that is called in turn. In the case of a DataFrame instance, it extends the NDFrame class, whose __deepcopy__ method delegates the operation to NDFrame.copy.

Here's the implementation of NDFrame.__deepcopy__:

def __deepcopy__(self, memo=None):
    """
    Parameters
    ----------
    memo, default None
        Standard signature. Unused
    """
    if memo is None:
        memo = {}
    return self.copy(deep=True)

You can read NDFrame.copy's docstring here, but here's the main excerpt:

def copy(self, deep=True):
    """
    Make a copy of this object's indices and data.
    When ``deep=True`` (default), a new object will be created with a
    copy of the calling object's data and indices. Modifications to
    the data or indices of the copy will not be reflected in the
    original object (see notes below).
    (...)
    Notes
    -----
    When ``deep=True``, data is copied but actual Python objects
    will not be copied recursively, only the reference to the object.
    This is in contrast to `copy.deepcopy` in the Standard Library,
    which recursively copies object data (see examples below).

In your example, the data is Python lists, that are therefore not actually copied.

Why doesn't deepcopy of a Pandas DataFrame affect memory usage?

Tags:

python

pandas

Max Segal

1 Answers

Right leg

Recent Activity

Donate For Us

Why doesn't deepcopy of a Pandas DataFrame affect memory usage?

Tags:

python

pandas

Max Segal

1 Answers

Right leg

Related questions

Recent Activity

Donate For Us