I am working on my first large python project. I have one function which has the following code in it:
# EXPAND THE EXPECTED VALUE TO APPLY TO ALL STATES,
# THEN UPDATE fullFnMat
EV_subset_expand = np.kron(EV_subset, np.ones((nrows, 1)))
fullFnMat[key] = staticMat[key] + EV_subset_expand
In my code profiler, it seems like this kronecker product is actually taking up a huge amount of time.
Function was called by...
ncalls tottime cumtime
/home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/BellmanEquation.py:17(bellmanFn) <- 19 37.681 38.768 /home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/dpclient.py:467(solveTheModel)
{numpy.core.multiarray.concatenate} <- 342 27.319 27.319 /usr/lib/pymodules/python2.7/numpy/lib/shape_base.py:665(kron)
/home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/dpclient.py:467(solveTheModel) <- 1 11.041 91.781 <string>:1(<module>)
{method 'argsort' of 'numpy.ndarray' objects} <- 19 7.692 7.692 /usr/lib/pymodules/python2.7/numpy/core/fromnumeric.py:597(argsort)
/usr/lib/pymodules/python2.7/numpy/core/numeric.py:789(outer) <- 171 2.526 2.527 /usr/lib/pymodules/python2.7/numpy/lib/shape_base.py:665(kron)
{method 'max' of 'numpy.ndarray' objects} <- 209 2.034 2.034 /home/stevejb/myhg/dpsolve/ootest/tests/ddw2011/profile_dir/dpclient.py:391(getValPolMatrices)
Is there a way to get faster kronecker products in Numpy? It seems like it shouldn't take as long as it is.
You can certainly take a look at the source for np.kron
. It can be found in numpy/lib/shape_base.py
, and you can see if there are improvements that can be made or simplifications that might make it more efficient. Alternatively you could write your own using Cython or some other binding to a low level language to try to eek out better performance.
Or as @matt suggested something like the following might be natively faster:
import numpy as np
nrows = 10
a = np.arange(100).reshape(10,10)
b = np.tile(a,nrows).reshape(nrows*a.shape[0],-1) # equiv to np.kron(a,np.ones((nrows,1)))
or:
b = np.repeat(a,nrows*np.ones(a.shape[0],np.int),axis=0)
Timings:
In [80]: %timeit np.tile(a,nrows).reshape(nrows*a.shape[0],-1)
10000 loops, best of 3: 25.5 us per loop
In [81]: %timeit np.kron(a,np.ones((nrows,1)))
10000 loops, best of 3: 117 us per loop
In [91]: %timeit np.repeat(a,nrows*np.ones(a.shape[0],np.int),0)
100000 loops, best of 3: 12.8 us per loop
Using np.repeat
for the sized arrays in the above example gives a pretty nice 10x speed-up, which isn't too shabby.
Maybe np.kron()
is allocating memory and then you're throwing it away. Try using np.tile()
instead. I don't know if that allocates more memory or plays indexing tricks under the covers. If you're only multiplying EV_subset by ones, you don't really need to call np.kron()
.
The following may help (in the general case when one of the arrays is not 'ones'). Example is for two arrays A,B of shape (a,b,c) and (d,e,f); Generalize as needed.
Gets it done in a single 'multiply' op and a quick reshape.
kprod = A[:,newaxis,:,newaxis,:,newaxis] * B[newaxis,:, newaxis,:, newaxis,:]
#
# kprod.shape = (a,d,b,e,c,f) now; is full outer product with desired arrangement
# in memory.
kprod.shape = (a*d,b*e,c*f) # reshape 'in place'
(maybe this is kron(B,A) instead of kron(A,B); reverse A & B if needed)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With