Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Each row of the input matrix needs to contain at least one non-zero entry

I have this issue when I run this chunk of code

text_lda <- LDA(text_dtm, k = 2, method = "VEM", control = NULL)

I have the next mistake "Each row of the input matrix needs to contain at least one non-zero entry"

Then I tried to solve this with these lines

row_total = apply(text_dtm, 1, sum)
empty.rows <- text_dtm[rowTotals == 0, ]$dimnames[1][[1]]

But I got the next issue

cannot allocate vector of size 3890.8 GB

This is the size of my DTM:

DocumentTermMatrix documents: 1968850, terms: 265238
Non-/sparse entries: 29766814/522184069486
Sparsity           : 100%
Maximal term length: 4000
Weighting          : term frequency (tf)
like image 235
coding Avatar asked Nov 21 '25 21:11

coding


1 Answers

Try this:

empty.rows <- text_dtm[rowTotals == 0, ]$dimnames[1][[1]] 
corpus_new <- corpus[-as.numeric(empty.rows)]

Or use tm to generate the dtm and then:

ui = unique(text_dtm$i)
text_dtm.new = text_dtm[ui,]
like image 138
captcoma Avatar answered Nov 23 '25 12:11

captcoma



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!