Implementing faster python inner product with BLAS

Question

I found this useful tutorial on using low-level BLAS functions (implemented in Cython) to get big speed improvements over standard numpy linear algebra routines in python. Now, I've successfully gotten the vector product working fine. First I save the following as linalg.pyx:

import cython
import numpy as np
cimport numpy as np

from libc.math cimport exp
from libc.string cimport memset

from scipy.linalg.blas import fblas

REAL = np.float64
ctypedef np.float64_t REAL_t

cdef extern from "/home/jlorince/flda/voidptr.h":
    void* PyCObject_AsVoidPtr(object obj)

ctypedef double (*ddot_ptr) (const int *N, const double *X, const int *incX, const double *Y, const int *incY) nogil
cdef ddot_ptr ddot=<ddot_ptr>PyCObject_AsVoidPtr(fblas.ddot._cpointer)  # vector-vector multiplication 

cdef int ONE = 1
def vec_vec(syn0, syn1, size):
    cdef int lSize = size
    f = <REAL_t>ddot(&lSize, <REAL_t *>(np.PyArray_DATA(syn0)), &ONE, <REAL_t *>(np.PyArray_DATA(syn1)), &ONE)
    return f

(source code for voidptr.h available here)

Once I compile it, it works fine, and is definitely faster than np.inner:

In [1]: import linalg
In [2]: import numpy as np
In [3]: x = np.random.random(100)
In [4]: %timeit np.inner(x,x)
1000000 loops, best of 3: 1.61 µs per loop
In [5]: %timeit linalg.vec_vec(x,x,100)
1000000 loops, best of 3: 483 ns per loop
In [8]: np.all(np.inner(x,x)==linalg.vec_vec(x,x,100))
Out[8]: True

Now, this is great, but only works for the case of calculating the dot/inner product (equivalent in this case) of two vectors. What I need to do now, implement similar functions (that I hope will offer similar speedups) for doing vector-matrix inner products. That is, I want to replicate the functionality of np.inner when passed a 1D array and a 2D matrix:

In [4]: x = np.random.random(5)
In [5]: y = np.random.random((5,5))
In [6]: np.inner(x,y)
Out[6]: array([ 1.42116225,  1.13242989,  1.95690196,  1.87691992,  0.93967486])

This is equivalent to calculating the inner/dot products (again, equivalent for 1D arrays) of the 1D array and each row of the matrix:

In [32]: [np.inner(x,row) for row in y]
Out[32]:
[1.4211622497461549, 1.1324298918119025, 1.9569019618096966,1.8769199192990056, 0.93967485730285505]

From what I've seen of the BLAS documentation, I think I need to start with something like this (using dgemv):

ctypedef double (*dgemv_ptr) (const str *TRANS, const int *M, const int *N, const double *ALPHA, const double *A, const int *LDA, const double *X, const int *incX, const double *BETA, const double *Y, const int *incY)
cdef dgemv_ptr dgemv=<dgemv>PyCObject_AsVoidPtr(fblas.dgemv._cpointer)  # matrix vector multiplication

But I need help (a) defining the actual function that I can use in Python (i.e. a vec-matrix function analogous to vec_vec above), and (b) knowing how to use it to properly replicate the behavior of np.inner, which is what I need for the model I'm implementing.

Edit: Here is the link to relevant BLAS documentation for dgemv, that I need to be using, which is confirmed here:

In [13]: np.allclose(scipy.linalg.blas.fblas.dgemv(1.0,y,x), np.inner(x,y))
Out[13]: True

But using it out of the box like this is actually slower than pure np.inner, so I still need help with the Cython implementation.

Edit2 Here's my latest attempt at this, which compiles fine but crashes python with a segmentation fault whenever I try to run it:

cdef int ONE = 1
cdef char tr = 'n'
cdef REAL_t ZEROF = <REAL_t>0.0
cdef REAL_t ONEF = <REAL_t>1.0
def mat_vec(mat,vec,mat_rows,mat_cols):
    cdef int m = mat_rows
    cdef int n = mat_cols
    out = <REAL_t>dgemv(&tr, &m, &n, &ONEF, <REAL_t *>(np.PyArray_DATA(mat)), &m, <REAL_t *>(np.PyArray_DATA(vec)), &ONE, &ZEROF, NULL, &ONE)
    return out

After compiling, I try to run linalg.mat_vec(y,x,5,5), (using the same x and y as above) but this just crashes. I think I'm close, but don't know what else to change...

Dima Tisnek · Accepted Answer

Per @Pietro Saccardi:

int dgemv_(char *trans, integer *m, integer *n, doublereal *
           alpha, doublereal *a, integer *lda, doublereal *x, integer *incx, 
           doublereal *beta, doublereal *y, integer *incy)

...

Y      - DOUBLE PRECISION array of DIMENSION at least   
         ( 1 + ( m - 1 )*abs( INCY ) ) when TRANS = 'N' or 'n'   
         and at least   
         ( 1 + ( n - 1 )*abs( INCY ) ) otherwise.   
         Before entry with BETA non-zero, the incremented array Y   
         must contain the vector y. On exit, Y is overwritten by the
         updated vector y.

I doubt you can use NULL for Y in the call.

Implementing faster python inner product with BLAS

Tags:

python

numpy

cython

linear-algebra

blas

moustachio

1 Answers

Dima Tisnek

Recent Activity

Donate For Us

Implementing faster python inner product with BLAS

Tags:

python

numpy

cython

linear-algebra

blas

moustachio

1 Answers

Dima Tisnek

Related questions

Recent Activity

Donate For Us