Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

easy sampling of vectors from a sparse matrix, and creating a new matrix from the sample (python)

This question has two parts (maybe one solution?):

Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix? When I'm trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.

from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K)    #works OK
mm = np.array(m)
sample(m,K)    #works OK
sm = lil_matrix(m)
sample(sm,K)   #throws exception TypeError: sparse matrix length is ambiguous.

My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:

indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
    sampledRows+=[sm.getrow(i)]

Any other efficient/elegant ideas? the dense matrix size is 1000x30000 and could be larger.

Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?

like image 320
ScienceFriction Avatar asked Mar 24 '12 21:03

ScienceFriction


1 Answers

Try

sm[np.random.sample(sm.shape[0], K, replace=False), :]

This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample). I'm not sure it's super-fast, but it can't really be worse than manually accessing row by row like you're currently doing, and probably preallocates the results.

like image 166
Danica Avatar answered Oct 31 '22 01:10

Danica