Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform nonnegative matrix factorization in R

I have a sparse matrix in R

I now wish to perform nonnegative matrix factorization on R

data.txt is a text file i created using python, it consists of 3 columns where first column specifies the row number, second the column number and third the value

data.txt

1 5 10
3 2 5
4 6 9

original data.txt contains 164009 rows which is data for 250000x250000 sparse matrix

I used NMF library and I am doing

x=scan('data.txt',what=list(integer(),integer(),numeric()))
library('Matrix')
R=sparseMatrix(i=x[[1]],j=x[[2]],x=x[[3]]) 
res<-nmf(R,3)

It is giving me an error:

Error in function (classes, fdef, mtable): unable to find an inherited method for function nmf, for signature "dgCMAtrix", "missing", "missing"

Could anyone help me figure out what am I doing wrong?

like image 354
user1344389 Avatar asked Oct 08 '22 01:10

user1344389


2 Answers

The first problem is that you are providing a dgCMatrix to nmf.

> class(R)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

The help is here:

help(nmf)

See the Methods section. It wants a real matrix. Coercing with as.matrix is likely to not be of very much service to you, because of the number of entries.

Now, even with your example data, coercion to a matrix is insufficient as written:

> nmf(as.matrix(R))
Error: NMF::nmf : when argument 'rank' is not provided, argument 'seed' is required to inherit from class 'NMF'. See ?nmf.

Let's give it a rank:

> nmf(as.matrix(R),2)
Error in .local(x, rank, method, ...) : 
  Input matrix x contains at least one null row.

And indeed it does:

> R
4 x 6 sparse Matrix of class "dgCMatrix"

[1,] . . . . 10 .
[2,] . . . .  . .
[3,] . . 5 .  . .
[4,] . . . .  . 9
like image 158
Matthew Lundberg Avatar answered Oct 12 '22 01:10

Matthew Lundberg


Almost 10 years later there are solutions. Here's a fast one.

If you have a dgCMatrix with 250k-square dgCMatrix that is anywhere near 1% sparse, you need a sparse factorization algorithm.

I wrote RcppML::NMF for large sparse matrices:

library(RcppML)
A <- rsparsematrix(1000, 10000, 0.01)
model <- RcppML::nmf(A, k = 10)
str(model)

That should take a few seconds on a laptop.

You might also check out rsparse::WRMF, although it isn't as fast.

like image 27
zdebruine Avatar answered Oct 12 '22 01:10

zdebruine