Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R DocumentTermMatrix control list not working, silently ignores unknown parameters

I have two following DTM-s:

dtm <- DocumentTermMatrix(t)

dtmImproved <- DocumentTermMatrix(t, 
               control=list(minWordLength = 4, minDocFreq=5))

When I implement this, I see two equal DTM-s and if I open the dtmImproved, there are words with 3 symbols. Why doesn't the minWordLength parameter work? Thank you!

> dtm
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)
> dtmImproved
A document-term matrix (591 documents, 10533 terms)

Non-/sparse entries: 43058/6181945
Sparsity           : 99%
Maximal term length: 135 
Weighting          : term frequency (tf)
like image 396
Artem Sultan Avatar asked Nov 13 '12 18:11

Artem Sultan


1 Answers

dtmImproved <- DocumentTermMatrix(t, control=list(wordLengths=c(4, 15), 
                                   bounds = list(global = c(5,Inf))))

This solves the problem! The lack of proper documentation really mads me down (:

like image 191
Artem Sultan Avatar answered Nov 01 '22 06:11

Artem Sultan