When I load an array using numpy.loadtxt, it seems to take too much memory. E.g.
a = numpy.zeros(int(1e6))
causes an increase of about 8MB in memory (using htop, or just 8bytes*1million \approx 8MB). On the other hand, if I save and then load this array
numpy.savetxt('a.csv', a)
b = numpy.loadtxt('a.csv')
my memory usage increases by about 100MB! Again I observed this with htop. This was observed while in the iPython shell, and also while stepping through code using Pdb++.
Any idea what's going on here?
After reading jozzas's answer, I realized that if I know ahead of time the array size, there is a much more memory efficient way to do things if say 'a' was an mxn array:
b = numpy.zeros((m,n))
with open('a.csv', 'r') as f:
reader = csv.reader(f)
for i, row in enumerate(reader):
b[i,:] = numpy.array(row)
Saving this array of floats to a text file creates a 24M text file. When you re-load this, numpy goes through the file line-by-line, parsing the text and recreating the objects.
I would expect memory usage to spike during this time, as numpy doesn't know how big the resultant array needs to be until it gets to the end of the file, so I'd expect there to be at least 24M + 8M + other temporary memory used.
Here's the relevant bit of the numpy code, from /lib/npyio.py
:
# Parse each line, including the first
for i, line in enumerate(itertools.chain([first_line], fh)):
vals = split_line(line)
if len(vals) == 0:
continue
if usecols:
vals = [vals[i] for i in usecols]
# Convert each value according to its column and store
items = [conv(val) for (conv, val) in zip(converters, vals)]
# Then pack it according to the dtype's nesting
items = pack_items(items, packing)
X.append(items)
#...A bit further on
X = np.array(X, dtype)
This additional memory usage shouldn't be a concern, as this is just the way python works - while your python process appears to be using 100M of memory, internally it maintains knowledge of which items are no longer used, and will re-use that memory. For example, if you were to re-run this save-load procedure in the one program (save, load, save, load), your memory usage will not increase to 200M.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With