How do I use Spark's Feature Importance on Random Forest?

Question

The documentation for Random Forests does not include feature importances. However, it is listed on the Jira as resolved and is in the source code. HERE also says "The main differences between this API and the original MLlib ensembles API are:

support for DataFrames and ML Pipelines
separation of classification vs. regression
use of DataFrame metadata to distinguish continuous and categorical features
more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification."

However, I cannot figure out a syntax that works to call this new feature.

scala> model
res13: org.apache.spark.mllib.tree.model.RandomForestModel = 
TreeEnsembleModel classifier with 10 trees

scala> model.featureImportances
<console>:60: error: value featureImportances is not a member of org.apache.spark.mllib.tree.model.RandomForestModel
              model.featureImportances

Climbs_lika_Spyder · Accepted Answer

You have to use the new Random Forests. Check your imports. The OLD:

import org.apache.spark.mllib.tree.RandomForest
import org.apache.spark.mllib.tree.model.RandomForestModel

The NEW Random Forests use:

import org.apache.spark.ml.classification.RandomForestClassificationModel
import org.apache.spark.ml.classification.RandomForestClassifier

This S.O. answer provides code for extracting the importances.

This S.O. answer explains the sparse vector that is returned.

How do I use Spark's Feature Importance on Random Forest?

Tags:

scala

apache-spark

random-forest

apache-spark-mllib

Climbs_lika_Spyder

1 Answers

Climbs_lika_Spyder

Recent Activity

Donate For Us

How do I use Spark's Feature Importance on Random Forest?

Tags:

scala

apache-spark

random-forest

apache-spark-mllib

Climbs_lika_Spyder

1 Answers

Climbs_lika_Spyder

Related questions

Recent Activity

Donate For Us