I have two following DTM-s:
dtm <- DocumentTermMatrix(t)
dtmImproved <- DocumentTermMatrix(t,
control=list(minWordLength = 4, minDocFreq=5))
When I implement this, I see two equal DTM-s and if I open the dtmImproved
, there are words with 3 symbols. Why doesn't the minWordLength
parameter work? Thank you!
> dtm
A document-term matrix (591 documents, 10533 terms)
Non-/sparse entries: 43058/6181945
Sparsity : 99%
Maximal term length: 135
Weighting : term frequency (tf)
> dtmImproved
A document-term matrix (591 documents, 10533 terms)
Non-/sparse entries: 43058/6181945
Sparsity : 99%
Maximal term length: 135
Weighting : term frequency (tf)
dtmImproved <- DocumentTermMatrix(t, control=list(wordLengths=c(4, 15),
bounds = list(global = c(5,Inf))))
This solves the problem! The lack of proper documentation really mads me down (:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With