I am a beginner at R programming language and currently try to work on a project. There's a huge Document Term Matrix (DTM) and I would like to convert it into a Data Frame. However due to the restrictions of the functions, I am not able to do so.
The method that I have been using is to first convert it into a matrix, and then convert it to data frame.
DF <- data.frame(as.matrix(DTM), stringsAsFactors=FALSE)
It was working perfectly with smaller size DTM. However when the DTM is too large, I am not able to convert it to a matrix, yielding the error as shown below:
Error: cannot allocate vector of size 2409.3 Gb
Tried looking online for a few days however I am not able to find a solution. Would be really thankful if anyone is able to suggest what is the best way to convert a DTM into a DF (especially when dealing with large size DTM).
In the tidytext package there is actually a function to do just that. Try using the tidy
function which will return a tibble (basically a fancy dataframe that will print nicely). The nice thing about the tidy function is it'll take care of the pesky StringsAsFactors=FALSE
issue by not converting strings to factors and it will deal nicely with the sparsity of your DTM.
as.matrix
is trying to convert your DTM into a non-sparse matrix with an entry for every document and term even if the term occurs 0 times in that document, which is causing your memory usage to ballon. tidy` will convert it into a dataframe where each document only has the counts for the term found in them.
In your example here you'd run
library(tidytext)
DF <- tidy(DTM)
There's even a vignette on how to use the tidytext
packages (meant to work in the tidyverse) here.
It's possible that as.data.frame(as.matrix(DTM), stringsAsFactors=False)
instead of data.frame(as.matrix(DTM), stringsAsFactors=False)
might do the trick.
The API documentation notes that as.data.frame()
simply coerces a matrix into a dataframe, whereas data.frame()
creates a new data frame from the input.
as.data.frame(...)
-> https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.data.frame.html
data.frame(...)
-> https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With