Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory growth with broadcast operations in NumPy

I am using NumPy to handle some large data matrices (of around ~50GB in size). The machine where I am running this code has 128GB of RAM so doing simple linear operations of this magnitude shouldn't be a problem memory-wise.

However, I am witnessing a huge memory growth (to more than 100GB) when computing the following code in Python:

import numpy as np

# memory allocations (everything works fine)
a = np.zeros((1192953, 192, 32), dtype='f8')
b = np.zeros((1192953, 192), dtype='f8')
c = np.zeros((192, 32), dtype='f8')

a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here

Please note that initial memory allocations are done without any problems. However, when I try to perform the subtract operation with broadcasting, the memory grows to more than 100GB. I always thought that broadcasting would avoid making extra memory allocations but now I am not sure if this is always the case.

As such, can someone give some details on why this memory growth is happening, and how the following code could be rewritten using more memory efficient constructs?

I am running the code in Python 2.7 within IPython Notebook.

like image 287
Cesar Avatar asked Jul 21 '15 10:07

Cesar


1 Answers

@rth's suggestion to do the operation in smaller batches is a good one. You could also try using the function np.subtract and give it the destination array to avoid creating an addtional temporary array. I also think you don't need to index c as c[np.newaxis, :, :], because it is already a 3-d array.

So instead of

a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here

try

np.subtract(b[:, :, np.newaxis], c, a)

The third argument of np.subtract is the destination array.

like image 160
Warren Weckesser Avatar answered Oct 22 '22 23:10

Warren Weckesser