I am using Spark 2.0.2. I am also using the "ml" library for Machine Learning with Datasets. What I want to do is run algorithms with cross validation and extract the mentioned metrics (accuracy, precision, recall, ROC, confusion matrix). My data labels are binary.
By using the MulticlassClassificationEvaluator I can only get the accuracy of the algorithm by accessing "avgMetrics". Also, by using the BinaryClassificationEvaluator I can get the area under ROC. But I cannot use them both. So, is there a way that I can extract all of the wanted metrics?
Choosing Between Spark MLlib and Spark ML At first glance, the most obvious difference between MLlib and ML is the data types they work on, with MLlib supporting RDDs and ML supporting DataFrame s and Dataset s.
Precision and recall are two extremely important model evaluation metrics. While precision refers to the percentage of your results which are relevant, recall refers to the percentage of total relevant results correctly classified by your algorithm.
Accuracy tells you how many times the ML model was correct overall. Precision is how good the model is at predicting a specific category. Recall tells you how many times the model was able to detect a specific category.
Have tried to use MLlib to evaluate your result.
I've transformed the dataset to RDD, then used MulticlassMetrics in MLlib
You can see a demo here: Spark DecisionTreeExample.scala
private[ml] def evaluateClassificationModel(
model: Transformer,
data: DataFrame,
labelColName: String): Unit = {
val fullPredictions = model.transform(data).cache()
val predictions = fullPredictions.select("prediction").rdd.map(_.getDouble(0))
val labels = fullPredictions.select(labelColName).rdd.map(_.getDouble(0))
// Print number of classes for reference.
val numClasses = MetadataUtils.getNumClasses(fullPredictions.schema(labelColName)) match {
case Some(n) => n
case None => throw new RuntimeException(
"Unknown failure when indexing labels for classification.")
}
val accuracy = new MulticlassMetrics(predictions.zip(labels)).accuracy
println(s" Accuracy ($numClasses classes): $accuracy")
}
You can follow the official Evaluation Metrics guide provided by Apache Spark. The document has provided all the Evaluation Metrics including
Here is the link : https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With