I am doing a multiclass prediction with random forest in Spark ML.
For this MulticlassClassificationEvaluator() in spark ML, is it possible to get precision/recall by each class labels?
Currently, I am only seeing precision/recall combined for all class together.
For example, a perfect precision and recall score would result in a perfect F-Measure score: F-Measure = (2 * Precision * Recall) / (Precision + Recall) F-Measure = (2 * 1.0 * 1.0) / (1.0 + 1.0) F-Measure = (2 * 1.0) / 2.0.
A system with high precision but low recall is just the opposite, returning very few results, but most of its predicted labels are correct when compared to the training labels. An ideal system with high precision and high recall will return many results, with all results labeled correctly.
Precision-Recall (PR) Curve – A PR curve is simply a graph with Precision values on the y-axis and Recall values on the x-axis. In other words, the PR curve contains TP/(TP+FN) on the y-axis and TP/(TP+FP) on the x-axis. It is important to note that Precision is also called the Positive Predictive Value (PPV).
Models need high recall when you need output-sensitive predictions. For example, predicting cancer or predicting terrorists needs a high recall, in other words, you need to cover false negatives as well. It is ok if a non-cancer tumor is flagged as cancerous but a cancerous tumor should not be labeled non-cancerous.
Use directly org.apache.spark.mllib.evaluation.MulticlassMetrics
and then get metrics available-
// copied from spark git
val predictionAndLabels =
dataset.select(col($(predictionCol)), col($(labelCol)).cast(DoubleType)).rdd.map {
case Row(prediction: Double, label: Double) => (prediction, label)
}
val metrics = new MulticlassMetrics(predictionAndLabels)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With