According to MKL BLAS documentation "All matrix-matrix operations (level 3) are threaded for both dense and sparse BLAS." http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library
I have built Scipy with MKL BLAS. Using the test code below, I see the expected multithreaded speedup for dense, but not sparse, matrix multiplication. Are there any changes to Scipy to enable multithreaded sparse operations?
# test dense matrix multiplication
from numpy import *
import time
x = random.random((10000,10000))
t1 = time.time()
foo = dot(x.T, x)
print time.time() - t1
# test sparse matrix multiplication
from scipy import sparse
x = sparse.rand(10000,10000)
t1 = time.time()
foo = dot(x.T, x)
print time.time() - t1
Matrices that mostly contain zeroes are said to be sparse. Sparse matrices are commonly used in applied machine learning (such as in data containing data-encodings that map categories to count) and even in whole subfields of machine learning such as natural language processing (NLP).
dot not multi-threading.
nnz returns the number of nonzero elements in a sparse matrix. nonzeros returns a column vector containing all the nonzero elements of a sparse matrix. nzmax returns the amount of storage space allocated for the nonzero entries of a sparse matrix.
Sparse Matrices in PythonA dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.
As far as I know, the answer is no. But, you can build your own wrapper around the MKL sparse multiply routines. You asked about the multiplying two sparse matrices. Below is some a wrapper code I used for multiplying one sparse matrix times a dense vector, so it shouldn't be hard to adapt (look at the Intel MKL reference for mkl_cspblas_dcsrgemm). Also, be aware of how your scipy arrays are stored: default is coo, but csr (or csc) may be better choices. I chose csr, but MKL supports most types (just call the appropriate routine).
From what I could tell, both scipy's default and MKL are multithreaded. By changing OMP_NUM_THREADS
I could see a difference in performance.
To use the function below, if you havea a recent version of MKL, just make sure you have LD_LIBRARY_PATHS
set to include the relevant MKL directories. For older versions, you need to build some specific libraries. I got my information from IntelMKL in python
def SpMV_viaMKL( A, x ):
"""
Wrapper to Intel's SpMV
(Sparse Matrix-Vector multiply)
For medium-sized matrices, this is 4x faster
than scipy's default implementation
Stephen Becker, April 24 2014
[email protected]
"""
import numpy as np
import scipy.sparse as sparse
from ctypes import POINTER,c_void_p,c_int,c_char,c_double,byref,cdll
mkl = cdll.LoadLibrary("libmkl_rt.so")
SpMV = mkl.mkl_cspblas_dcsrgemv
# Dissecting the "cspblas_dcsrgemv" name:
# "c" - for "c-blas" like interface (as opposed to fortran)
# Also means expects sparse arrays to use 0-based indexing, which python does
# "sp" for sparse
# "d" for double-precision
# "csr" for compressed row format
# "ge" for "general", e.g., the matrix has no special structure such as symmetry
# "mv" for "matrix-vector" multiply
if not sparse.isspmatrix_csr(A):
raise Exception("Matrix must be in csr format")
(m,n) = A.shape
# The data of the matrix
data = A.data.ctypes.data_as(POINTER(c_double))
indptr = A.indptr.ctypes.data_as(POINTER(c_int))
indices = A.indices.ctypes.data_as(POINTER(c_int))
# Allocate output, using same conventions as input
nVectors = 1
if x.ndim is 1:
y = np.empty(m,dtype=np.double,order='F')
if x.size != n:
raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))
elif x.shape[1] is 1:
y = np.empty((m,1),dtype=np.double,order='F')
if x.shape[0] != n:
raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))
else:
nVectors = x.shape[1]
y = np.empty((m,nVectors),dtype=np.double,order='F')
if x.shape[0] != n:
raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))
# Check input
if x.dtype.type is not np.double:
x = x.astype(np.double,copy=True)
# Put it in column-major order, otherwise for nVectors > 1 this FAILS completely
if x.flags['F_CONTIGUOUS'] is not True:
x = x.copy(order='F')
if nVectors == 1:
np_x = x.ctypes.data_as(POINTER(c_double))
np_y = y.ctypes.data_as(POINTER(c_double))
# now call MKL. This returns the answer in np_y, which links to y
SpMV(byref(c_char("N")), byref(c_int(m)),data ,indptr, indices, np_x, np_y )
else:
for columns in xrange(nVectors):
xx = x[:,columns]
yy = y[:,columns]
np_x = xx.ctypes.data_as(POINTER(c_double))
np_y = yy.ctypes.data_as(POINTER(c_double))
SpMV(byref(c_char("N")), byref(c_int(m)),data,indptr, indices, np_x, np_y )
return y
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With