Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multipying and adding inside of a too large array

Tags:

python

numpy

I have an array A that is of the shape (M,N) now I'd like to do the operation

R = (A[:,newaxis,:] * A[newaxis,:,:]).sum(2)

which should yield an (MxM) array. Now the problem is that the array is quite large and I get a Memory error, because the MxMxN array won't fit into memory.

what would be the best strategy to get this done? C? map()? or is there a special function for this yet?

thank you, David

like image 853
Magellan88 Avatar asked Jun 07 '13 14:06

Magellan88


1 Answers

I'm not sure how large you arrays are but the following is equivalent:

R = np.einsum('ij,kj',A,A)

And can be quite a bit faster and is much less memory intensive:

In [7]: A = np.random.random(size=(500,400))

In [8]: %timeit R = (A[:,np.newaxis,:] * A[np.newaxis,:,:]).sum(2)
1 loops, best of 3: 1.21 s per loop

In [9]: %timeit R = np.einsum('ij,kj',A,A)
10 loops, best of 3: 54 ms per loop

If I increase the size of A to (500,4000), np.einsum plows through the calculation in about 2 seconds, whereas the original formulation grinds my machine to a halt due to the size of the temporary array it has to create.

Update:

As @Jaime pointed out in the comments, np.dot(A,A.T) is also an equivalent formulation of the problem, and can be even faster than the np.einsum solution. Full credit to him for pointing that out, but in case he doesn't post it as a formal solution, I wanted to pull it out into the main answer.

like image 116
JoshAdel Avatar answered Oct 22 '22 03:10

JoshAdel