After loading a dataframe from a pickle with ~15 million rows (which occupies ~250 MB), I perform some search operations on it and then delete some rows in place. During these operations the memory usage skyrockets to 5 and sometimes 7 GB, which is annoying because of swapping (my laptop only has 8 GB memory).
The point is that this memory is not freed when the operations are finished (i.e. when the last two lines in the code below are executed). So the Python process still takes up to 7 GB of memory.
Any idea why this happens? I'm using Pandas 0.20.3.
Minimal example below. The 'data' variable in reality would have ~15 million rows but I wouldn't know how to post it here.
import datetime, pandas as pd
data = {'Time':['2013-10-29 00:00:00', '2013-10-29 00:00:08', '2013-11-14 00:00:00'], 'Watts': [0, 48, 0]}
df = pd.DataFrame(data, columns = ['Time', 'Watts'])
# Convert string to datetime
df['Time'] = pd.to_datetime(df['Time'])
# Make column Time as the index of the dataframe
df.index = df['Time']
# Delete the column time
df = df.drop('Time', 1)
# Get the difference in time between two consecutive data points
differences = df.index.to_series().diff()
# Keep only the differences > 60 mins
differences = differences[differences > datetime.timedelta(minutes=60)]
# Get the string of the day of the data points when the data gathering resumed
toRemove = [datetime.datetime.strftime(date, '%Y-%m-%d') for date in differences.index.date]
# Remove data points belonging to the day where the differences was > 60 mins
for dataPoint in toRemove:
df.drop(df[dataPoint].index, inplace=True)
You might want to try invoking the garbage collector. gc.collect()
See How can I explicitly free memory in Python? for more information
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With