I have created a DocumentTermMatrix that contains 1859 documents (rows) and 25722 (columns). In order to perform further calculations on this matrix I need to convert it to a regular matrix. I want to use the as.matrix()
command. However, it returns the following error: cannot allocate vector of size 364.8 MB.
> corp
A corpus with 1859 text documents
> mat<-DocumentTermMatrix(corp)
> dim(mat)
[1] 1859 25722
> is(mat)
[1] "DocumentTermMatrix"
> mat2<-as.matrix(mat)
Fehler: kann Vektor der Größe 364.8 MB nicht allozieren # cannot allocate vector of size 364.8 MB
> object.size(mat)
5502000 bytes
For some reason the size of the object seems to increase dramatically whenever it is transformed to a regular matrix. How can I avoid this?
Or is there an alternative way to perform regular matrix operations on a DocumentTermMatrix?
The quick and dirty way is to export your data into a sparse matrix object from an external package like Matrix
.
> attributes(dtm)
$names
[1] "i" "j" "v" "nrow" "ncol" "dimnames"
$class
[1] "DocumentTermMatrix" "simple_triplet_matrix"
$Weighting
[1] "term frequency" "tf"
The dtm object has the i, j and v attributes which is the internal representation of your DocumentTermMatrix. Use:
library("Matrix")
mat <- sparseMatrix(
i=dtm$i,
j=dtm$j,
x=dtm$v,
dims=c(dtm$nrow, dtm$ncol)
)
and you're done.
A naive comparison between your objects:
> mat[1,1:100]
> head(as.vector(dtm[1,]), 100)
will each give you the exact same output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With