I need to compute the trace of a matrix across all its diagonals. That is, for an nxm matrix, the operation should produce n+m-1 'traces'. Here is an example program:
import numpy as np
A=np.arange(12).reshape(3,4)
def function_1(A):
output=np.zeros(A.shape[0]+A.shape[1]-1)
for i in range(A.shape[0]+A.shape[1]-1):
output[i]=np.trace(A,A.shape[1]-1-i)
return output
A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
function_1(A)
array([ 3., 9., 18., 15., 13., 8.])
My hope is to find a way to replace the loop in the program, since I need to do this computation many times on very large matrices. One avenue that looks promising is to use numpy.einsum, but I can't quite figure out how to do it. Alternatively I have looked into rewriting the problem entirely with loops in cython:
%load_ext cythonmagic
%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def function_2(long [:,:] A):
cdef int n=A.shape[0]
cdef int m=A.shape[1]
cdef long [::1] output = np.empty(n+m-1,dtype=np.int64)
cdef size_t l1
cdef int i,j, k1
cdef long out
it_list1=range(m)
it_list2=range(m,m+n-1)
for l1 in range(len(it_list1)):
k1=it_list1[l1]
i=0
j=m-1-k1
out=0
while (i<n)&(j<m):
out+=A[i,j]
i+=1
j+=1
output[k1]=out
for l1 in range(len(it_list2)):
k1=it_list2[l1]
i=k1-m+1
j=0
out=0
while (i<n)&(j<m):
out+=A[i,j]
i+=1
j+=1
output[k1]=out
return np.array(output)
The cython program outperforms the program looping through np.trace:
%timeit function_1(A)
10000 loops, best of 3: 62.7 µs per loop
%timeit function_2(A)
100000 loops, best of 3: 9.66 µs per loop
So, basically I want to get feedback on whether there was a more efficient way to use numpy/scipy routines, or if I have probably achieved the fastest way using cython.
If you want to stay away from Cython, building a diagonal index array and using np.bincount
may do the trick:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> rows, cols = a.shape
>>> rows_arr = np.arange(rows)
>>> cols_arr = np.arange(cols)
>>> diag_idx = rows_arr[:, None] - (cols_arr - (cols - 1))
>>> diag_idx
array([[3, 2, 1, 0],
[4, 3, 2, 1],
[5, 4, 3, 2]])
>>> np.bincount(diag_idx.ravel(), weights=a.ravel())
array([ 3., 9., 18., 15., 13., 8.])
By my timings, for your example input, it is 4x faster than your original pure Python method. So I don't think it is going to be faster than your Cython code, but you may want to time it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With