I'm using the rpart package for decision tree classification. I have a data frame with around 4000 features (columns). I want to use all features in rpart()
for my model. How can I do that? Basically, rpart()
will ask me to use the function in this way:
dt <- rpart(class ~ feature1 + feature2 + ....)
My features are words in documents so I have more than 4k features. Each feature is represented by a word. Is there any possibility to use all features without writing them?
Rpart is a powerful machine learning library in R that is used for building classification and regression trees. This library implements recursive partitioning and is very easy to use.
The rpart( ) function trains a classification regression decision tree using the Gini index as its class purity metric. Since this algorithm is different from the information entropy computation used in C5.
From the documentation for the rpart package: minbucket. the minimum number of observations in any terminal node.
The complexity parameter (cp) in rpart is the minimum improvement in the model needed at each node. It's based on the cost complexity of the model defined as… For the given tree, add up the misclassification at every terminal node.
I figured it out:
dt <- rpart(class ~ ., data)
"." represents all features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With