Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate Similarity of Sparse Matrix

I am using Python with numpy, scipy and scikit-learn module.

I'd like to classify the arrays in very big sparse matrix. (100,000 * 100,000)

The values in the matrix are equal to 0 or 1. The only thing I have is the index of value = 1.

a = [1,3,5,7,9] 
b = [2,4,6,8,10]

which means

a = [0,1,0,1,0,1,0,1,0,1,0]
b = [0,0,1,0,1,0,1,0,1,0,1]

How can I change the index array to the sparse array in scipy ?

How can I classify those array quickly ?

Thank you very much.

like image 308
Jimmy Lin Avatar asked Jul 19 '13 10:07

Jimmy Lin


1 Answers

If you choose the sparse coo_matrix you can create it passing the indices like:

from scipy.sparse import coo_matrix
import scipy
nrows = 100000
ncols = 100000
row = scipy.array([1,3,5,7,9])
col = scipy.array([2,4,6,8,10])
values = scipy.ones(col.size)
m = coo_matrix((values, (row,col)), shape=(nrows, ncols), dtype=float)
like image 99
Saullo G. P. Castro Avatar answered Oct 25 '22 01:10

Saullo G. P. Castro