I have a Python list of row information for a sparse matrix. Each row is represented as a list of (column, value) tuples. Call it alist
:
alist = [[(1,10), (3,-3)],
[(2,12)]]
How can I efficiently construct a scipy sparse matrix from this list of lists, resulting in a matrix like this:
0 10 0 -3
0 0 12 0
The obvious approach is to make a scipy.sparse.lil_matrix
, which internally has this "list of lists" structure. But from the scipy.sparse.lil_matrix — SciPy v0.19.0 Reference Guide I see just three ways of constructing them:
So the only way to get fresh data in is either to solve this problem with some other sparse matrix representation, or to start with a dense array, neither of which address the initial problem, and both of which seem likely to be less efficient representations than lil_matrix
itself for this data.
I guess I can make an empty one, and use a loop to add values, but surely I'm missing something.
The scipy documentation is really frustrating when it comes to sparse matrices.
Description. S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.
Now to keep track of non-zero elements in a sparse matrix we have 3-tuple method using an array. Elements of the first row represent the number of rows, columns and non-zero values in the sparse matrix. Elements of the other rows give information about the location and value of non-zero elements.
Linked List representation of the sparse matrix. In a linked list representation, the linked list data structure is used to represent the sparse matrix. The advantage of using a linked list to represent the sparse matrix is that the complexity of inserting or deleting a node in a linked list is lesser than the array.
Sparse matrix is a matrix which contains very few non-zero elements. When a sparse matrix is represented with a 2-dimensional array, we waste a lot of space to represent that matrix. For example, consider a matrix of size 100 X 100 containing only 10 non-zero elements.
If you have the whole alist
before creating the sparse matrix, there is no need to use lil_matrix
, since that is optimized for incrementally updating the sparse matrix.
If you want to do any sort of arithmetic with the matrix afterwords, csr_matrix
is probably your best choice. You can construct the csr_matrix
directly by using (data, (row, column))
format, like this:
In [40]: alist = [[(1,10), (3,-3)],
...: [(2,12)]]
In [41]: i, j, data = zip(*((i, t[0], t[1]) for i, row in enumerate(alist) for t in row))
In [42]: (i, j, data)
Out[42]: ((0, 0, 1), (1, 3, 2), (10, -3, 12))
In [43]: csr_matrix((data, (i, j)), shape=(2, 4)).todense()
Out[43]:
matrix([[ 0, 10, 0, -3],
[ 0, 0, 12, 0]], dtype=int64)
If efficiency is a real concern, you can create the csr_matrix
internal format (using indptr) directly:
In [57]: indptr = np.cumsum([0] + [len(row) for row in alist])
In [58]: j, data = zip(*(t for row in alist for t in row))
In [59]: csr_matrix((data, j, indptr), shape=(2, 4)).todense()
Out[59]:
matrix([[ 0, 10, 0, -3],
[ 0, 0, 12, 0]])
If you're converting to pandas afterwords, coo_matrix
is the way to go, since pandas converts to coo_matrix
anyway:
In [41]: i, j, data = zip(*((i, t[0], t[1]) for i, row in enumerate(alist) for t in row))
In [43]: coo_matrix((data, (i, j)), shape=(2, 4))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With