I want to know how to efficiently add sparse matrices in Python.
I have a program that breaks a big task into subtasks and distributes them across several CPUs. Each subtask yields a result (a scipy sparse matrix formatted as: lil_matrix
).
The sparse matrix dimensions are: 100000x500000 , which is quite huge, so I really need the most efficient way to sum all the resulting sparse matrices into a single sparse matrix, using some C-compiled method or something.
A sparse matrix can be stored in full-matrix storage mode or a packed storage mode. When a sparse matrix is stored in full-matrix storage mode, all its elements, including its zero elements, are stored in an array.
Two elements with the same row values are further sorted according to their column values. Now to Add the matrices, we simply traverse through both matrices element by element and insert the smaller element (one with smaller row and col value) into the resultant matrix.
Have you tried timing the simplest method?
matrix_result = matrix_a + matrix_b
The documentation warns this may be slow for LIL matrices, suggesting the following may be faster:
matrix_result = (matrix_a.tocsr() + matrix_b.tocsr()).tolil()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With