I have large but sparse arrays and I want to rearrange them by swapping rows an columns. What is a good way to do this in scipy.sparse
?
Some issues
I don't think that permutation matrices are well suited for this task, as they like randomly change the sparsity structure. And a manipulation will always 'multiply' all columns or rows, even if there are only a few swaps necessary.
What is the best sparse matrix representation in scipy.sparse
for this task?
Suggestions for implementation are very welcome.
I have tagged this with Matlab as well, since this question might find an answer that is not necessarily scipy
specific.
Matrix reordering is a task to permute the rows and columns of a given observed matrix such that the resulting reordered matrix shows meaningful or interpretable structural patterns.
The number of zero-valued elements divided by the total number of elements (e.g., m × n for an m × n matrix) is sometimes referred to as the sparsity of the matrix.
CSC format keeps a list of the row indices of all non-zero entries, CSR format keeps a list of the column indices of all non-zero entries. I think you can take advantage of that to swap things around as follows, and I think there shouldn't be any side-effects to it:
def swap_rows(mat, a, b) :
mat_csc = scipy.sparse.csc_matrix(mat)
a_idx = np.where(mat_csc.indices == a)
b_idx = np.where(mat_csc.indices == b)
mat_csc.indices[a_idx] = b
mat_csc.indices[b_idx] = a
return mat_csc.asformat(mat.format)
def swap_cols(mat, a, b) :
mat_csr = scipy.sparse.csr_matrix(mat)
a_idx = np.where(mat_csr.indices == a)
b_idx = np.where(mat_csr.indices == b)
mat_csr.indices[a_idx] = b
mat_csr.indices[b_idx] = a
return mat_csr.asformat(mat.format)
You could now do something like this:
>>> mat = np.zeros((5,5))
>>> mat[[1, 2, 3, 3], [0, 2, 2, 4]] = 1
>>> mat = scipy.sparse.lil_matrix(mat)
>>> mat.todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 1.],
[ 0., 0., 0., 0., 0.]])
>>> swap_rows(mat, 1, 3)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 4 stored elements in LInked List format>
>>> swap_rows(mat, 1, 3).todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 1.],
[ 0., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
>>> swap_cols(mat, 0, 4)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
with 4 stored elements in LInked List format>
>>> swap_cols(mat, 0, 4).todense()
matrix([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 1., 0., 0.],
[ 1., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0.]])
I have used a LIL matrix to show how you could preserve the type of your output. In your application you probably want to already be in CSC or CSR format, and select whether to swap rows or columns first based on it, to minimize conversions.
I've found using matrix operations to be the most efficient. Here's a function which will permute the rows and/or columns to a specified order. It can be modified to swap two specific rows/columns if you would like.
from scipy import sparse
def permute_sparse_matrix(M, row_order=None, col_order=None):
"""
Reorders the rows and/or columns in a scipy sparse matrix to the specified order.
"""
if row_order is None and col_order is None:
return M
new_M = M
if row_order is not None:
I = sparse.eye(M.shape[0]).tocoo()
I.row = I.row[row_order]
new_M = I.dot(new_M)
if col_order is not None:
I = sparse.eye(M.shape[1]).tocoo()
I.col = I.col[col_order]
new_M = new_M.dot(I)
return new_M
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With