Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rearrange sparse arrays by swapping rows and columns

I have large but sparse arrays and I want to rearrange them by swapping rows an columns. What is a good way to do this in scipy.sparse?

Some issues

  • I don't think that permutation matrices are well suited for this task, as they like randomly change the sparsity structure. And a manipulation will always 'multiply' all columns or rows, even if there are only a few swaps necessary.

  • What is the best sparse matrix representation in scipy.sparse for this task?

  • Suggestions for implementation are very welcome.

I have tagged this with Matlab as well, since this question might find an answer that is not necessarily scipy specific.

like image 236
Jan Avatar asked Mar 01 '13 10:03

Jan


People also ask

What is matrix reordering?

Matrix reordering is a task to permute the rows and columns of a given observed matrix such that the resulting reordered matrix shows meaningful or interpretable structural patterns.

What is the sparsity pattern of a matrix?

The number of zero-valued elements divided by the total number of elements (e.g., m × n for an m × n matrix) is sometimes referred to as the sparsity of the matrix.


2 Answers

CSC format keeps a list of the row indices of all non-zero entries, CSR format keeps a list of the column indices of all non-zero entries. I think you can take advantage of that to swap things around as follows, and I think there shouldn't be any side-effects to it:

def swap_rows(mat, a, b) :
    mat_csc = scipy.sparse.csc_matrix(mat)
    a_idx = np.where(mat_csc.indices == a)
    b_idx = np.where(mat_csc.indices == b)
    mat_csc.indices[a_idx] = b
    mat_csc.indices[b_idx] = a
    return mat_csc.asformat(mat.format)

def swap_cols(mat, a, b) :
    mat_csr = scipy.sparse.csr_matrix(mat)
    a_idx = np.where(mat_csr.indices == a)
    b_idx = np.where(mat_csr.indices == b)
    mat_csr.indices[a_idx] = b
    mat_csr.indices[b_idx] = a
    return mat_csr.asformat(mat.format)

You could now do something like this:

>>> mat = np.zeros((5,5))
>>> mat[[1, 2, 3, 3], [0, 2, 2, 4]] = 1
>>> mat = scipy.sparse.lil_matrix(mat)
>>> mat.todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.],
        [ 0.,  0.,  0.,  0.,  0.]])
>>> swap_rows(mat, 1, 3)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in LInked List format>
>>> swap_rows(mat, 1, 3).todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.]])
>>> swap_cols(mat, 0, 4)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in LInked List format>
>>> swap_cols(mat, 0, 4).todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.]])

I have used a LIL matrix to show how you could preserve the type of your output. In your application you probably want to already be in CSC or CSR format, and select whether to swap rows or columns first based on it, to minimize conversions.

like image 64
Jaime Avatar answered Oct 19 '22 02:10

Jaime


I've found using matrix operations to be the most efficient. Here's a function which will permute the rows and/or columns to a specified order. It can be modified to swap two specific rows/columns if you would like.

from scipy import sparse

def permute_sparse_matrix(M, row_order=None, col_order=None):
    """
    Reorders the rows and/or columns in a scipy sparse matrix to the specified order.
    """
    if row_order is None and col_order is None:
        return M
    
    new_M = M
    if row_order is not None:
        I = sparse.eye(M.shape[0]).tocoo()
        I.row = I.row[row_order]
        new_M = I.dot(new_M)
    if col_order is not None:
        I = sparse.eye(M.shape[1]).tocoo()
        I.col = I.col[col_order]
        new_M = new_M.dot(I)
    return new_M
like image 44
Jeff Avatar answered Oct 19 '22 03:10

Jeff