Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large matrix multiplication in Python - what is the best option?

I have two boolean sparse square matrices of c. 80,000 x 80,000 generated from 12BM of data (and am likely to have orders of magnitude larger matrices when I use GBs of data).

I want to multiply them (which produces a triangular matrix - however I dont get this since I don't limit the dot product to yield a triangular matrix).

I am wondering what the best way of multiplying them is (memory-wise and speed-wise) - I am going to do the computation on a m2.4xlarge AWS instance which has >60GB of RAM. I would prefer to keep the calc in RAM for speed reasons.

I appreciate that SciPy has sparse matrices and so does h5py, but have no experience in either.

Whats the best option to go for?

Thanks in advance

UPDATE: sparsity of the boolean matrices is <0.6%

like image 377
user7289 Avatar asked Dec 04 '13 19:12

user7289


People also ask

Is Matmul faster than dot?

matmul and both outperform np. dot . Also note, as explained in the docs, np.

What is the best matrix multiplication algorithm?

In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices.

Is NumPy matrix multiplication faster?

Faster libraries: Numpy As an example, let's compute matrix powers. Specifically, we compute A16 where A is a 100×100 matrix. Our plain Python solution takes 11.77 seconds to run, while using Numpy to perform the multiplications and generate the matrices takes 0.0097 seconds to run.


1 Answers

If your matrices are relatively empty it might be worthwhile encoding them as a data structure of the non-False values. Say a list of tuples describing the location of the non-False values. Or a dictionary with the tuples as the keys.

If you use e.g. a list of tuples you could use a list comprehension to find the items in the second list that can be multiplied with an element from the first list.

a = [(0,0), (3,7), (5,2)] # et cetera
b = ... # idem

for r, c in a:
    res = [(r, k) for j, k in b if k == j]
like image 116
Roland Smith Avatar answered Sep 27 '22 16:09

Roland Smith