I currently want to multiply a large sparse matrix(~1M x 200k) with its transpose. The values of the resulting matrix would be in float.
What is the efficient way to achieve this multiplication? Because I see a pattern in the computation.
I would like to know what libraries can achieve the computation faster. It can be in Python, R, C, C++ or any other one.
To Multiply the matrices, we first calculate transpose of the second matrix to simplify our comparisons and maintain the sorted order. So, the resultant matrix is obtained by traversing through the entire length of both matrices and summing the appropriate multiplied values.
In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices.
Multiplying Larger Matrices For the entry in the ith row and the jth column of the product matrix, multiply each entry in the ith row of the first matrix by the corresponding entry in the jth column of the second matrix and adding the results.
def URMdist(URM):
NLin=URM.shape[0]
NCol=URM.shape[1]
URMt=URM.T
Result = lil_matrix((NLin,NLin))
for Lin in range(0,NLin):
X = 0.0
for Col in range(Lin,NLin):
X = URM[Col,:].dot(URMt[:,Lin])
if X != 0.0:
Result[Lin,Col] = Result[Col,Lin] = X
return Result
I suppose your main need is to save memory. First as you multiply a matrix with its transpose, you do not need any memeory for the transpose : all of its cells are directly accessible through first matrix (tA[i,j] = A[j,i]). Almost 1/3 of memory saved.
I can see that computation time cannot be neglected too. As the resulting matrix will be symetric, you can compute only one half and directly store the other. Near half of computation time saved.
And if you are sure that you initial matrix is really sparse, and so can hope the resulting one will be too, you can directly store the result in a scipy sparse matrix, COO format : only three lists to store the non null values.
But ... I do not know any library to do that and you will have to code it yourself in your prefered language (probably python as you spoke of scipy).
Python code example (matrix = A[M][N])
I = []
J = []
V = []
for i in range(M):
for j in range(i:M) :
X = 0.0
for k in range(N):
X += A[i ][k] * A[k][j]
if X != 0.0 # or abs (X) > epsilon if floating point accuracy is a concern ...
I.append (i )
J.append(j)
V.append(X)
I.append (j )
J.append(i)
V.append(X)
And I, J, V are what is needed for a scipy COO sparse matrix via :
RESULT = sparse.coo_matrix((V,(I,J)),shape=(N, N))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With