I'm using pandas to do a ring buffer, but the memory use keeps growing. what am I doing wrong?
Here is the code (edited a little from the first post of the question):
import pandas as pd
import numpy as np
import resource
tempdata = np.zeros((10000,3))
tdf = pd.DataFrame(data=tempdata, columns = ['a', 'b', 'c'])
i = 0
while True:
i += 1
littledf = pd.DataFrame(np.random.rand(1000, 3), columns = ['a', 'b', 'c'])
tdf = pd.concat([tdf[1000:], littledf], ignore_index = True)
del littledf
currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
if i% 1000 == 0:
print 'total memory:%d kb' % (int(currentmemory)/1000)
this is what I get:
total memory:37945 kb
total memory:38137 kb
total memory:38137 kb
total memory:38768 kb
total memory:38768 kb
total memory:38776 kb
total memory:38834 kb
total memory:38838 kb
total memory:38838 kb
total memory:38850 kb
total memory:38854 kb
total memory:38871 kb
total memory:38871 kb
total memory:38973 kb
total memory:38977 kb
total memory:38989 kb
total memory:38989 kb
total memory:38989 kb
total memory:39399 kb
total memory:39497 kb
total memory:39587 kb
total memory:39587 kb
total memory:39591 kb
total memory:39604 kb
total memory:39604 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39612 kb
not sure if it's related to this:
https://github.com/pydata/pandas/issues/2659
Tested on MacBook Air with Anaconda Python
Instead of using concat, why not update the DataFrame in place? i % 10
will determine which 1000 row slot you write to each update.
i = 0
while True:
i += 1
tdf.iloc[1000*(i % 10):1000+1000*(i % 10)] = np.random.rand(1000, 3)
currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
if i% 1000 == 0:
print 'total memory:%d kb' % (int(currentmemory)/1000)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With