Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing rank computation for very large sparse matrices

I have a sparse matrix such as

A =

   (1,1)        1
   (3,1)        1
   (1,2)        1
   (2,2)        1
   (1,3)        1
   (3,3)        1
   (4,3)        1
   (4,4)        1

The full matrix of A can see look like as following:

full(A) =

     1     1     1     0
     0     1     0     0
     1     0     1     0
     0     0     1     1

I want to find the rank of matrix A by fast way(because my matrix can extend to 10000 x 20000). I try to do it by two ways but it give the different result

  1. Convert to full matrix and find rank using

    rank(full(A)) = 3
    
  2. Find the rank using sprank

    sprank(A) = 4
    

The true answer must be 3 that means using first way. However, it take long time to find the rank,especially matrix with large size. I know the reason why the second way give 4 because sprank only tells you how many rows/columns of your matrix have non-zero elements, while rank is reporting the actual rank of the matrix which indicates how many rows of your matrix are linearly independent. sprank(A) is 4 but rank(A) is only 3 because you can write the third row as a linear combination of the other rows, specifically A(2,:) - A(1,:).

My problem is that how to find the rank of a sparse matrix with lowest time consumption

Update: I tried to use some way. However, it reported larger time consumption

%% Create random matrix
 G = sparse(randi(2,1000,1000))-1;
 A=sparse(G) %% Because my input matrix is sparse matrix
 %% Measure performance
>> tic; rank(full(A)); toc
Elapsed time is 0.710750 seconds.
>> tic; svds(A); toc
Elapsed time is 1.130674 seconds.
>> tic; eigs(A); toc
Warning: Only 3 of the 6 requested eigenvalues converged. 
> In eigs>processEUPDinfo at 1472
  In eigs at 365
Elapsed time is 4.894653 seconds.
like image 459
user3051460 Avatar asked Nov 13 '14 12:11

user3051460


1 Answers

I don't know which algorithm is best suited for you and I agree that may be more appropriate to ask on math.stackexchange.com. While I was trying with the random matrix you supply G = sparse(randi(2,1000,1000))-1; I noticed that there is little chance that its rank is <1000, and whatever algorithm you use, it is likely that its performance is very data-dependant. For instance using eigs(G) on a 2000-samples square matrix of rank (resp.) [198,325,503,1026,2000] yields the following performance (in seconds): [0.64,0.90,1.38,1.57,4.00] which shows the performance of the eigs function is strongly related with the rank of the matrix.

I also searched for existing tools and gave a try to spnrank which I think is not so data-dependant (it gives a better performance than eigs for high ranks and worse if the rank is small).

In the end you may want to adapt your technical solution depending on the kind of matrices you are most likely to work with.

like image 149
Emilien Avatar answered Oct 20 '22 17:10

Emilien