Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't deepcopy of a Pandas DataFrame affect memory usage?

Tags:

python

pandas

I have this code:

import resource
from copy import deepcopy

import pandas as pd
import random

n=100000
df = pd.DataFrame({'x': [random.randint(1, 1000) for i in range(n)],
                   'y': [random.randint(1, 1000) for i in range(n)],
                   'z': [random.randint(1, 1000) for i in range(n)]
                   })
df2=deepcopy(df)
print 'memory peak: {kb} kb'.format(kb=resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

I expect to see different memory peak usages for this code with the line df2=deepcopy(df) and without it. But the memory shows exactly the same result. Isn't deepcopy supposed to clone the object and therefore increase memory usage?

like image 824
Max Segal Avatar asked Apr 18 '19 12:04

Max Segal


1 Answers

When calling copy.deepcopy on an object foo, its __dict__ is looked up for a __deepcopy__ method, that is called in turn. In the case of a DataFrame instance, it extends the NDFrame class, whose __deepcopy__ method delegates the operation to NDFrame.copy.

Here's the implementation of NDFrame.__deepcopy__:

def __deepcopy__(self, memo=None):
    """
    Parameters
    ----------
    memo, default None
        Standard signature. Unused
    """
    if memo is None:
        memo = {}
    return self.copy(deep=True)

You can read NDFrame.copy's docstring here, but here's the main excerpt:

def copy(self, deep=True):
    """
    Make a copy of this object's indices and data.
    When ``deep=True`` (default), a new object will be created with a
    copy of the calling object's data and indices. Modifications to
    the data or indices of the copy will not be reflected in the
    original object (see notes below).
    (...)
    Notes
    -----
    When ``deep=True``, data is copied but actual Python objects
    will not be copied recursively, only the reference to the object.
    This is in contrast to `copy.deepcopy` in the Standard Library,
    which recursively copies object data (see examples below).

In your example, the data is Python lists, that are therefore not actually copied.

like image 185
Right leg Avatar answered Oct 04 '22 20:10

Right leg