Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas GroupBy memory deallocation

Problem

I noticed that memory allocated while iterating through a Pandas GroupBy object is not deallocated after iteration. I use resource.getrusage(resource.RUSAGE_SELF).ru_maxrss (second answer in this post for details) to measure the total amount of active memory used by the Python process.

import resource
import gc

import pandas as pd
import numpy as np

i = np.random.choice(list(range(100)), 4000)
cols = list(range(int(2e4)))

df = pd.DataFrame(1, index=i, columns=cols)

gb = df.groupby(level=0)
# gb = list(gb)
for i in range(3):
    print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1e6)
    for idx, x in enumerate(gb):
        if idx == 0:
            print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1e6)
    # del idx, x
    # gc.collect()

prints the following total active memory (in gb)

0.671732
1.297424
1.297952
1.923288
1.923288
2.548624

Solutions

Uncommenting del idx, x and gc.collect() fixes the problem. I do however have to del all variables that reference the DataFrames returned by iterating over the groupby (which can be a pain depending on the code in the inner for loop). The new printed memory usages become:

0.671768
1.297412
1.297992
1.297992
1.297992
1.297992

Alternatively I can uncomment gb = list(gb). The resulting memory usages are roughly the same as those from the previous solution:

1.32874
1.32874
1.32874
1.32874
1.32874
1.32874

Questions

  1. Why is memory for DataFrames resulting from iteration through the groupby not deallocated after iteration is completed?
  2. Is there a better solution than the two above? If not, which of these two solutions is "better"?
like image 472
Alex Avatar asked Mar 03 '16 21:03

Alex


1 Answers

Memory Weirdness

This is very interesting! You do not need del idx, x. Only using gc.collect() worked to keep memory constant for me. This is much cleaner that having the del statements inside the loop.

like image 125
tmthydvnprt Avatar answered Oct 17 '22 02:10

tmthydvnprt