I am trying to convert the following Simple Triplet Matrix created with TermDocumentMatrix() of the tm package
A term-document matrix (317443 terms, 86960 documents)
Non-/sparse entries: 18472230/27586371050
Sparsity : 100%
Maximal term length: 653
Weighting : term frequency (tf)
of class
[1] "TermDocumentMatrix" "simple_triplet_matrix"
to a dense matrix.
But
dense <- as.matrix(tdm)
generates the error
Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow
I can't really understand the error and warning message. Trying to replicate the error on a small dataset with
library(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)
as.matrix(tdm)
doesn't produce the same issue. I saw from this answer that a similar problem was solved through the slam package (even though the question was about a sum operation and not a transformation into a dense matrix). I browsed the slam documentation but I couldn't find any specific function to transform an object of class simple_triplet_matrix into an object of class matrix.
You get an error because as commented you reach the limit of the integer limit, normal since you have huge number of documents.. This reproduces the error :
as.integer(.Machine$integer.max+1)
[1] NA
Warning message:
NAs introduced by coercion
Function vector which takes an integer as parameter fails since it second parameter is NA.
One solution is to redefine as.matrix.simple_triplet_matrix without calling vector. For example:
as.matrix.simple_triplet_matrix <-
function (x, ...)
{
nr <- x$nrow
nc <- x$ncol
## old line: y <- matrix(vector(typeof(x$v), nr * nc), nr, nc)
y <- matrix(0, nr, nc) ##
y[cbind(x$i, x$j)] <- x$v
dimnames(y) <- x$dimnames
y
}
But I am not sure it is a good idea to coerce to a matrix such sparse matrix(100%).
EDIT
One idea is to use saparseMatrix from Matrix package. Here an example where I compare the objects generated by each coercion. You gain a factor of 10 at lease ( I think regarding your very sparse matrix , you will gain more) by using sparseMatrix. Moreover, Addition and multiplication are supported by sparse Matrix.
require(tm)
data("crude")
dtm <- TermDocumentMatrix(crude,
control = list(weighting = weightTfIdf,
stopwords = TRUE))
library(Matrix)
Dense <- sparseMatrix(dtm$i,dtm$j,x=dtm$v)
dense <- as.matrix(dtm)
## check sizes
floor(as.numeric(object.size(dense)/object.size(Dense)))
## addistion and multiplication are supported
Dense+Dense
Dense*Dense
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With