I am trying to convert the following Simple Triplet Matrix created with TermDocumentMatrix()
of the tm
package
A term-document matrix (317443 terms, 86960 documents)
Non-/sparse entries: 18472230/27586371050
Sparsity : 100%
Maximal term length: 653
Weighting : term frequency (tf)
of class
[1] "TermDocumentMatrix" "simple_triplet_matrix"
to a dense matrix.
But
dense <- as.matrix(tdm)
generates the error
Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow
I can't really understand the error and warning message. Trying to replicate the error on a small dataset with
library(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)
as.matrix(tdm)
doesn't produce the same issue. I saw from this answer that a similar problem was solved through the slam
package (even though the question was about a sum operation and not a transformation into a dense matrix). I browsed the slam
documentation but I couldn't find any specific function to transform an object of class simple_triplet_matrix
into an object of class matrix
.
You get an error because as commented you reach the limit of the integer limit, normal since you have huge number of documents.. This reproduces the error :
as.integer(.Machine$integer.max+1)
[1] NA
Warning message:
NAs introduced by coercion
Function vector
which takes an integer as parameter fails since it second parameter is NA.
One solution is to redefine as.matrix.simple_triplet_matrix
without calling vector
. For example:
as.matrix.simple_triplet_matrix <-
function (x, ...)
{
nr <- x$nrow
nc <- x$ncol
## old line: y <- matrix(vector(typeof(x$v), nr * nc), nr, nc)
y <- matrix(0, nr, nc) ##
y[cbind(x$i, x$j)] <- x$v
dimnames(y) <- x$dimnames
y
}
But I am not sure it is a good idea to coerce to a matrix such sparse matrix(100%).
EDIT
One idea is to use saparseMatrix
from Matrix
package. Here an example where I compare the objects generated by each coercion. You gain a factor of 10 at lease ( I think regarding your very sparse matrix , you will gain more) by using sparseMatrix
. Moreover, Addition and multiplication are supported by sparse Matrix.
require(tm)
data("crude")
dtm <- TermDocumentMatrix(crude,
control = list(weighting = weightTfIdf,
stopwords = TRUE))
library(Matrix)
Dense <- sparseMatrix(dtm$i,dtm$j,x=dtm$v)
dense <- as.matrix(dtm)
## check sizes
floor(as.numeric(object.size(dense)/object.size(Dense)))
## addistion and multiplication are supported
Dense+Dense
Dense*Dense
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With