I'm writing a function that I'd like to translate to C as much as possible using Cython. To do this, I'll need to use linear algebra operations. Here is my function. EDIT: The lesson I've learned is to try to take care of linear algebra outside of loops, which I've largely been able to do. Otherwise, resort to wrapping LAPACK/BLAS or writing my own functions.
import numpy as np
from scipy.stats import multivariate_normal as mv
import itertools
def llf(data, rho, mu, sigma, A, V, n):
'''evaluate likelihood by guass-hermite quadrature
Parameters
----------
data : array
N x J matrix, columns are measurements
rho : array
length L vector of weights for mixture of normals
mu : array
L x K vector of means of mixture of normals
sigma : array
K x L*K matrix of variance matrices for mixture of normals
A : array
J x (K + 1) matrix of loadings
V : array
J x J variance matrix of measurement errors
n : int
number of sample points for quadrature
'''
N = data.shape[0]
L, K = mu.shape
# getting weights and sample points for approximating integral
v, w = np.polynomial.hermite.hermgauss(n)
totllf = 0
for i in range(N):
M_i = data[i, :]
totllf_i = 0
for l in range(L):
rho_l = rho[l]
sigma_l = sigma[:, K*l:K*(l+1)]
mu_l = mu[l, :]
chol_l = np.linalg.cholesky(sigma_l)
for ix in itertools.product(*(list(range(n)) for k in range(K))):
wt = np.prod(w[list(ix)])
pt = np.sqrt(2)*chol_l.dot(v[list(ix)]) + mu_l
totllf_i += wt*rho_l*mv.pdf(M_i, A[:, 0] + A[:, 1:].dot(pt), V)
totllf += np.log(totllf_i)
return totllf
In order to accomplish this I'd need to have functions for matrix multiplication, transpose, determinant, matrix inverse and cholesky decompostion. I've seen some posts on using BLAS
functions, but its really unclear to me how to use these.
EDIT 04/29/18
As suggested, I've taken a memory view approach and initialized everything prior to the loop. My new function is written as
def llf_c(double[:, ::1] data, double[::1] rho, double[:, ::1] mu,
double[:, ::1] sigma, double[:, ::1] A, double[:, ::1] V, int n):
'''evaluate likelihood by guass-hermite quadrature
Parameters
----------
data : array
N x J matrix, columns are measurements
rho : array
length L vector of weights for mixture of normals
mu : array
L x K vector of means of mixture of normals
sigma : array
K x L*K matrix of variance matrices for mixture of normals
A : array
J x (K + 1) matrix of loadings
V : array
J x J variance matrix of measurement errors
n : int
number of sample points for quadrature
'''
cdef Py_ssize_t N = data.shape[0], J = data.shape[1], L = mu.shape[0], K = mu.shape[1]
# initializing indexing variables
cdef Py_ssize_t i, l, j, k
# getting weights and sample points for approximating integral
v_a, w_a = np.polynomial.hermite.hermgauss(n)
cdef double[::1] v = v_a
cdef double[::1] w = w_a
cdef double[::1] v_ix = np.zeros(K, dtype=np.float)
# initializing memory views for cholesky decomposition of sigma matrices
sigma_chol_a = np.zeros((K, L*K), dtype=np.float)
for l in range(L):
sigma_chol_a[:, K*l:K*(l+1)] = np.linalg.cholesky(sigma[:, K*l:K*(l+1)])
cdef double[:, ::1] sigma_chol = sigma_chol_a
# intializing V inverse and determinant
cdef double[:, ::1] V_inv = np.linalg.inv(V)
cdef double V_det = np.linalg.det(V)
# initializing memoryviews for work matrices
cdef double[::1] work_K = np.zeros(K, dtype=np.float)
cdef double[::1] work_J = np.zeros(J, dtype=np.float)
# initializing memoryview for quadrature points
cdef double[::1] pt = np.zeros(K, dtype=np.float)
# initializing memorview for means for multivariate normal
cdef double[::1] loc = np.zeros(J, dtype=np.float)
# initializing values for loop
cdef double[::1] totllf = np.zeros(N, dtype=np.float)
cdef double wt, pdf_init = 1./sqrt(((2*pi)**J)*V_det)
cdef int[:, ::1] ix_vals = np.vstack(itertools.product(*(list(range(n)) for k in range(K)))).astype(np.int32)
cdef Py_ssize_t ix_len = ix_vals.shape[0]
for ix_row in range(ix_len):
ix = ix_vals[ix_row]
# weights and points for quadrature
wt = 1.
for k in range(K):
wt *= w[ix[k]]
v_ix[k] = v[ix[k]]
for l in range(L):
# change of variables
dotmv_c(sigma_chol[:, K*l:K*(l+1)], v_ix, work_K)
for k in range(K):
pt[k] = sqrt(2)*work_K[k]
addvv_c(pt, mu[l, :], pt)
for i in range(N):
# generating demeaned vector for multivariate normal pdf
dotmv_c(A[:, 1:], pt, work_J)
addvv_c(A[:, 0], work_J, work_J)
for j in range(J):
loc[j] = data[i, j] - work_J[j]
# performing matrix products in exponential
# print(wt, rho[l], np.asarray(work_J))
dotvm_c(loc, V_inv, work_J)
totllf[i] += wt*rho[l]*pdf_init*exp(-0.5*dotvv_c(work_J, loc))
return np.log(np.asarray(totllf)).sum()
The dotvm_c
, dotmv_c
and addvv_c
are functions that performance matrix multiplication of vector and matrix, matrix and vector, and elementwise addition of two vectors. I've written these in Cython as well, but am not including for brevity. I'm no longer wrapping any LAPACK functions as I do all other linear algebra prior to the loop using numpy. I have a few more questions. Why do I still have yellow in the loop? (See profile below). I thought everything should be in C now. Also if you have any other suggestions based on new implementation, please let me know.
For instance, in line 221, I get this message upon compiling: "Index should be typed for more efficient access." But I thought I typed the index k. Also, since addvv_c
shows up in yellow, I'll show you it's definition below.
cpdef void addvv_c(double[:] a, double[:] b, double[:] out):
'''add two vectors elementwise
'''
cdef Py_ssize_t i, n = a.shape[0]
for i in range(n):
out[i] = a[i] + b[i]
A few of small points with respect to your optimized Cython/BLAS functions:
ipiv_a = np.zeros(n).astype(np.int32)
cdef int[::1] ipiv = ipiv_a
can have two easy improvements: it doesn't has to go through a temporary variable, and you can directly create an array of type np.int32
rather than create an array of a different type then casting:
cdef int[::1] ipiv = np.zeros(n,dtype=np.int32)
Simiarly, in both fucntions you can initialize B
in a fewer steps by doing
cdef double[:, ::1] B = A.copy()
rather than creating a zero array and copying
The second (more significant) change would be to use C arrays for temporary variables such as the Fortran workspaces. I'd still keep things like return values as numpy arrays since the reference counting and ability to send them back to Python is very useful.
cdef double* work = <double*>malloc(n*n*sizeof(double))
try:
# rest of function
finally:
free(work)
You need to cimport malloc
and free
from libc.stdlib
. The try: ... finally:
ensures the memory is correctly deallocated. Don't go too over the top with this - for example, if it isn't obvious where to deallocate a C array then just use numpy.
The final option to look at is to not have a return value but to modify an input:
cdef void inv_c(double[:,::1] A, double[:,::1] B):
# check that A and B are the right size, then just write into B
# ...
The advantage of this is that if you need to call this in a loop with identical sized inputs then you only need to do one allocation for the whole loop. You could extend this to include the working arrays too, although that might be a little more complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With