Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting in Sparse Matrix

I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix. Code may explain it better:

# for `rand` function, you need newer version of scipy.
from scipy.sparse import *
m = rand(6,6, density=0.6)
d = m.getrow(0)
print d

Output1

(0, 5) 0.874881629788 
(0, 4) 0.352559852239 
(0, 2) 0.504791645463 
(0, 1) 0.885898140175

I have this m matrix. I want to create a new matrix with sorted version of m. The new matrix contains 0'th row like this.

new_d = new_m.getrow(0)
print new_d

Output2

(0, 1) 0.885898140175
(0, 5) 0.874881629788  
(0, 2) 0.504791645463
(0, 4) 0.352559852239

So I can obtain which column is bigger etc:

print new_d.indices

Output3

array([1, 5, 2, 4])

Of course every row should be sorted like above independently.

I have one solution for this problem but it is not elegant.

like image 801
Baskaya Avatar asked Apr 04 '12 08:04

Baskaya


2 Answers

If you're willing to ignore the zero-value elements of the matrix, the code below should work. It is also much faster than implementations that use the getrow method, which is rather slow.

from itertools import izip

def sort_coo(m):
    tuples = izip(m.row, m.col, m.data)
    return sorted(tuples, key=lambda x: (x[0], x[2]))

For example:

    >>> from numpy.random import rand
    >>> from scipy.sparse import coo_matrix
    >>>
    >>> d = rand(10, 20)
    >>> d[d > .05] = 0
    >>> s = coo_matrix(d)
    >>> sort_coo(s)
    [(0, 2, 0.004775589084940246),
     (3, 12, 0.029941507166614145),
     (5, 19, 0.015030386789436245),
     (7, 0, 0.0075044957259399192),
     (8, 3, 0.047994403933129481),
     (8, 5, 0.049401058471327031),
     (9, 15, 0.040011608000125043),
     (9, 8, 0.048541825332137023)]

Depending on your needs you may want to tweak the sort keys in the lambda or further process the output. If you want everything in a row indexed dictionary you could do:

from collections import defaultdict

sorted_rows = defaultdict(list)

for i in sort_coo(m):
     sorted_rows[i[0]].append((i[1], i[2]))
like image 70
Alexander Measure Avatar answered Nov 15 '22 19:11

Alexander Measure


My bad solution is like this:

from scipy.sparse import coo_matrix
import numpy as np
a = []
for i in xrange(m.shape[0]): # assume m is square matrix.
   d = m.getrow(i)
   n = len(d.indices)
   s = zip([i]*n, d.indices, d.data)
   sorted_s = sorted(s, key=lambda v: v[2], reverse=True)
   a.extend(sorted_s)
a = np.array(a)
new_m = coo_matrix((a[:,2], (a[:,0], a[:,1])), m.shape)

There can be some simple mistakes above because I have not checked it yet. But the idea is intuitive, I guess. Is there any good solution?

Edit

This new matrix creation may be useless because if you call getrow method then the order is broken again. Only coo_matrix.col keeps the order.

Another Solution

This one is not exact solution but it may be helpful:

def sortSparseMatrix(m, rev=True, only_indices=True):

    """ Sort a sparse matrix and return column index dictionary
    """
    col_dict = dict() 
    for i in xrange(m.shape[0]): # assume m is square matrix.
        d = m.getrow(i)
        s = zip(d.indices, d.data)
        sorted_s = sorted(s, key=lambda v: v[1], reverse=True)
        if only_indices:
            col_dict[i] = [element[0] for element in sorted_s]
        else:
            col_dict[i] = sorted_s
    return col_dict

>>> print sortSparseMatrix(m)
{0: [5, 1, 0],
 1: [1, 3, 5],
 2: [1, 2, 3, 4],
 3: [1, 5, 2, 4],
 4: [0, 3, 5, 1],
 5: [3, 4, 2]}
like image 38
Baskaya Avatar answered Nov 15 '22 19:11

Baskaya