I think I have a big data (N = 1e6 and dimension = 3 ) scenario. I require to do some matrix manipulation such as einsum, matrix inversion etc several times in my code. To give an idea I want to do something like below.
import numpy.random as rd
ndata, kdata = 1e6, 1e5
x = rd.normal(0,1,(ndata, kdata,3,3))
y = rd.normal(0,1,(ndata, kdata,3,3))
For small ndata, kdata following would be efficient and convenient approach,
xy = einsum('pqrs, pqsu -> pqru', x, y )
Since I have large ndata and kdata above approach becomes a memory bound problem so next bet would be the dot product with nested for loop over ndata and kdata as follows:
xyloop1 = np.empty((ndata, kdata, 3, 3))
for j in xrange(ndata):
for k in xrange(kdata):
xyloop1[j,k] = np.dot(x[j,k], y[j,k] )
Given what I am taught for loops are nasty in python. Also, I want to use benefits of numpy so thought block matrix approach would be preferable something like following:
nstep = 200
ndiv = ndata/nstep
kstep = 200
kdiv = kdata/kstep
xyloop2 = np.empty((ndata, kdata, 3, 3))
for j in xrange(ndiv):
ji, jf = j*nstep, (j+1)*nstep
for k in xrange(kdiv):
ki, kf = k*kstep, (k+1)*kstep
xyloop2[ji:jf,ki:kf] = einsum('pqrs, pqsu -> pqru', x[ji:jf,ki:kf], y[ji:jf,ki:kf] )
Also, I need these xy or xyloop1 or xyloop2 for my further calculation. So I have to write and read it after every computation. Given the bandwidth of system I/O do you reckon best approach would be approach 3 as it means less I/O and also small number of iterations in compare to approach 2? If you have any other idea or need more info please let me know.
I am new to the stack so please be gentle with me :). Any help will be highly appreciated. BTW I am trying to solve a mixture modelling problem for a big data. Thanks!
While I agree with the comment that the only way to know for sure is to profile things for yourself, there are a few guiding principles that can help you write efficient numpy
code on the first try. Here are a few suggestions for your problem:
np.dot
creates an array but only performs 27 add-multiplies.numpy
calls will be more efficient than np.einsum
. Try C = np.sum(A[...,:,None] * B[...,:,:], axis=-2)
(this is quite speculative though).So I would try something like the following:
xyloop2 = np.empty((ndata, kdata, 3, 3))
for i in xrange(ndata):
xyloop2[i] = np.sum(x[i,:,:,:,None] * y[i,:,None,:,:], axis=-2)
Similar to approach 2, but much simpler (and more efficient) for-loop. Also swapped out the matrix multiply for something I think might be faster.
Apparently, sometime einsum is efficient. Examples with p,q,r,s to be 100, 50, 3, 3
Example I:
%timeit tt=np.einsum('pqrs, pqsu->pqru',x,y)
100 loops, best of 3: 3.45 ms per loop
%timeit zz= np.sum(x[:,:,:,None,:]*y[:,:,:,None],axis=-2)
10000 loops, best of 3: 153 µs per loop
Example II:
%timeit zz= np.sum(x[:,:,:,None,:]*y[:,:,:,None,None],axis=-2)
1000 loops, best of 3: 274 µs per loop
%timeit tt=np.einsum('pqrs, pqs->pqr',x,y)
10000 loops, best of 3: 151 µs per loop
np.allclose(zz,tt)
True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With