I'm trying to wrap the LAPACK function dgtsv (a solver for tridiagonal systems of equations) using Cython.
I came across this previous answer, but since dgtsv is not one of the LAPACK functions that are wrapped in scipy.linalg I don't think I can use this particular approach. Instead I've been trying to follow this example. 
Here's the contents of my lapacke.pxd file:
ctypedef int lapack_int
cdef extern from "lapacke.h" nogil:
    int LAPACK_ROW_MAJOR
    int LAPACK_COL_MAJOR
    lapack_int LAPACKE_dgtsv(int matrix_order,
                             lapack_int n,
                             lapack_int nrhs,
                             double * dl,
                             double * d,
                             double * du,
                             double * b,
                             lapack_int ldb)
...here's my thin Cython wrapper in _solvers.pyx:
#!python
cimport cython
from lapacke cimport *
cpdef TDMA_lapacke(double[::1] DL, double[::1] D, double[::1] DU,
                   double[:, ::1] B):
    cdef:
        lapack_int n = D.shape[0]
        lapack_int nrhs = B.shape[1]
        lapack_int ldb = B.shape[0]
        double * dl = &DL[0]
        double * d = &D[0]
        double * du = &DU[0]
        double * b = &B[0, 0]
        lapack_int info
    info = LAPACKE_dgtsv(LAPACK_ROW_MAJOR, n, nrhs, dl, d, du, b, ldb)
    return info
...and here's a Python wrapper and test script:
import numpy as np
from scipy import sparse
from cymodules import _solvers
def trisolve_lapacke(dl, d, du, b, inplace=False):
    if (dl.shape[0] != du.shape[0] or dl.shape[0] != d.shape[0] - 1
            or b.shape != d.shape):
        raise ValueError('Invalid diagonal shapes')
    if b.ndim == 1:
        # b is (LDB, NRHS)
        b = b[:, None]
    # be sure to force a copy of d and b if we're not solving in place
    if not inplace:
        d = d.copy()
        b = b.copy()
    # this may also force copies if arrays are improperly typed/noncontiguous
    dl, d, du, b = (np.ascontiguousarray(v, dtype=np.float64)
                    for v in (dl, d, du, b))
    # b will now be modified in place to contain the solution
    info = _solvers.TDMA_lapacke(dl, d, du, b)
    print info
    return b.ravel()
def test_trisolve(n=20000):
    dl = np.random.randn(n - 1)
    d = np.random.randn(n)
    du = np.random.randn(n - 1)
    M = sparse.diags((dl, d, du), (-1, 0, 1), format='csc')
    x = np.random.randn(n)
    b = M.dot(x)
    x_hat = trisolve_lapacke(dl, d, du, b)
    print "||x - x_hat|| = ", np.linalg.norm(x - x_hat)
Unfortunately, test_trisolve just segfaults on the call to _solvers.TDMA_lapacke. 
I'm pretty sure my setup.py is correct - ldd _solvers.so shows that _solvers.so is being linked to the correct shared libraries at runtime.
I'm not really sure how to proceed from here - any ideas?
A brief update:
for smaller values of n I tend not to get segfaults immediately, but I do get nonsense results (||x - x_hat|| ought to be very close to 0): 
In [28]: test_trisolve2.test_trisolve(10)
0
||x - x_hat|| =  6.23202576396
In [29]: test_trisolve2.test_trisolve(10)
-7
||x - x_hat|| =  3.88623414288
In [30]: test_trisolve2.test_trisolve(10)
0
||x - x_hat|| =  2.60190676562
In [31]: test_trisolve2.test_trisolve(10)
0
||x - x_hat|| =  3.86631743386
In [32]: test_trisolve2.test_trisolve(10)
Segmentation fault
Usually LAPACKE_dgtsv returns with code 0 (which should indicate success), but occasionally I get -7, which means that argument 7 (b) had an illegal value. What's happening is that only the first value of b is actually being modified in place. If I keep on calling test_trisolve I will eventually hit a segfault even when n is small.
OK, I figured it out eventually - it seems I've misunderstood what row- and column-major refer to in this case.
Since C-contiguous arrays follow row-major order, I assumed that I ought to specify LAPACK_ROW_MAJOR as the first argument to LAPACKE_dgtsv.
In fact, if I change
info = LAPACKE_dgtsv(LAPACK_ROW_MAJOR, ...)
to
info = LAPACKE_dgtsv(LAPACK_COL_MAJOR, ...)
then my function works:
test_trisolve2.test_trisolve()
0
||x - x_hat|| =  6.67064747632e-12
This seems pretty counter-intuitive to me - can anyone explain why this is the case?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With