The documentation for Random Forests does not include feature importances. However, it is listed on the Jira as resolved and is in the source code. HERE also says "The main differences between this API and the original MLlib ensembles API are:
However, I cannot figure out a syntax that works to call this new feature.
scala> model
res13: org.apache.spark.mllib.tree.model.RandomForestModel =
TreeEnsembleModel classifier with 10 trees
scala> model.featureImportances
<console>:60: error: value featureImportances is not a member of org.apache.spark.mllib.tree.model.RandomForestModel
model.featureImportances
You have to use the new Random Forests. Check your imports. The OLD:
import org.apache.spark.mllib.tree.RandomForest
import org.apache.spark.mllib.tree.model.RandomForestModel
The NEW Random Forests use:
import org.apache.spark.ml.classification.RandomForestClassificationModel
import org.apache.spark.ml.classification.RandomForestClassifier
This S.O. answer provides code for extracting the importances.
This S.O. answer explains the sparse vector that is returned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With