I am using NumPy to handle some large data matrices (of around ~50GB in size). The machine where I am running this code has 128GB of RAM so doing simple linear operations of this magnitude shouldn't be a problem memory-wise.
However, I am witnessing a huge memory growth (to more than 100GB) when computing the following code in Python:
import numpy as np
# memory allocations (everything works fine)
a = np.zeros((1192953, 192, 32), dtype='f8')
b = np.zeros((1192953, 192), dtype='f8')
c = np.zeros((192, 32), dtype='f8')
a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here
Please note that initial memory allocations are done without any problems. However, when I try to perform the subtract operation with broadcasting, the memory grows to more than 100GB. I always thought that broadcasting would avoid making extra memory allocations but now I am not sure if this is always the case.
As such, can someone give some details on why this memory growth is happening, and how the following code could be rewritten using more memory efficient constructs?
I am running the code in Python 2.7 within IPython Notebook.
@rth's suggestion to do the operation in smaller batches is a good one. You could also try using the function np.subtract
and give it the destination array to avoid creating an addtional temporary array. I also think you don't need to index c
as c[np.newaxis, :, :]
, because it is already a 3-d array.
So instead of
a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here
try
np.subtract(b[:, :, np.newaxis], c, a)
The third argument of np.subtract
is the destination array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With