Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Huge memory consumption

After loading a dataframe from a pickle with ~15 million rows (which occupies ~250 MB), I perform some search operations on it and then delete some rows in place. During these operations the memory usage skyrockets to 5 and sometimes 7 GB, which is annoying because of swapping (my laptop only has 8 GB memory).

The point is that this memory is not freed when the operations are finished (i.e. when the last two lines in the code below are executed). So the Python process still takes up to 7 GB of memory.

Any idea why this happens? I'm using Pandas 0.20.3.

Minimal example below. The 'data' variable in reality would have ~15 million rows but I wouldn't know how to post it here.

import datetime, pandas as pd

data = {'Time':['2013-10-29 00:00:00', '2013-10-29 00:00:08', '2013-11-14 00:00:00'], 'Watts': [0, 48, 0]}
df = pd.DataFrame(data, columns = ['Time', 'Watts'])
# Convert string to datetime
df['Time'] = pd.to_datetime(df['Time'])
# Make column Time as the index of the dataframe
df.index = df['Time']
# Delete the column time
df = df.drop('Time', 1)

# Get the difference in time between two consecutive data points
differences = df.index.to_series().diff()
# Keep only the differences > 60 mins
differences = differences[differences > datetime.timedelta(minutes=60)]
# Get the string of the day of the data points when the data gathering resumed
toRemove = [datetime.datetime.strftime(date, '%Y-%m-%d') for date in differences.index.date]

# Remove data points belonging to the day where the differences was > 60 mins
for dataPoint in toRemove:
    df.drop(df[dataPoint].index, inplace=True)
like image 632
RiccB Avatar asked Nov 07 '22 11:11

RiccB


1 Answers

You might want to try invoking the garbage collector. gc.collect() See How can I explicitly free memory in Python? for more information

like image 120
Ryan Stout Avatar answered Nov 24 '22 17:11

Ryan Stout