I am using Spark 2.0.2. I am also using the "ml" library for Machine Learning with Datasets. What I want to do is run algorithms with cross validation and extract the mentioned metrics (accuracy, precision, recall, ROC, confusion matrix). My data labels are binary. By using the MulticlassClassificationEvaluator I can only get the accuracy of the algorithm by accessing "avgMetrics". Also, by using the BinaryClassificationEvaluator I can get the area under ROC. But I cannot use them both. So, is there a way that I can extract all of the wanted metrics?

You can follow the official Evaluation Metrics guide provided by Apache Spark. The document has provided all the Evaluation Metrics including <ul> <li>Precision (Positive Predictive Value), Recall (True Positive Rate), F-measure, Receiver Operating Characteristic (ROC), Area Under ROC Curve, Area Under Precision-Recall Curve.</li> </ul> Here is the link : https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

How to get accuracy precision, recall and ROC from cross validation in Spark ml lib?

Tags:

machine-learning

scala

apache-spark

precision-recall

I am using Spark 2.0.2. I am also using the "ml" library for Machine Learning with Datasets. What I want to do is run algorithms with cross validation and extract the mentioned metrics (accuracy, precision, recall, ROC, confusion matrix). My data labels are binary.

By using the MulticlassClassificationEvaluator I can only get the accuracy of the algorithm by accessing "avgMetrics". Also, by using the BinaryClassificationEvaluator I can get the area under ROC. But I cannot use them both. So, is there a way that I can extract all of the wanted metrics?

471

asked Jan 18 '17 08:01

user3309479

2 Answers

Have tried to use MLlib to evaluate your result.

I've transformed the dataset to RDD, then used MulticlassMetrics in MLlib

You can see a demo here: Spark DecisionTreeExample.scala

Click to copy

private[ml] def evaluateClassificationModel(
      model: Transformer,
      data: DataFrame,
      labelColName: String): Unit = {
    val fullPredictions = model.transform(data).cache()
    val predictions = fullPredictions.select("prediction").rdd.map(_.getDouble(0))
    val labels = fullPredictions.select(labelColName).rdd.map(_.getDouble(0))
    // Print number of classes for reference.
    val numClasses = MetadataUtils.getNumClasses(fullPredictions.schema(labelColName)) match {
      case Some(n) => n
      case None => throw new RuntimeException(
        "Unknown failure when indexing labels for classification.")
    }
    val accuracy = new MulticlassMetrics(predictions.zip(labels)).accuracy
    println(s"  Accuracy ($numClasses classes): $accuracy")
  }

answered Oct 03 '22 15:10

ShuoshuoFan

You can follow the official Evaluation Metrics guide provided by Apache Spark. The document has provided all the Evaluation Metrics including

Precision (Positive Predictive Value), Recall (True Positive Rate), F-measure, Receiver Operating Characteristic (ROC), Area Under ROC Curve, Area Under Precision-Recall Curve.

Here is the link : https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

answered Oct 03 '22 14:10

Darshan

Related questions
                            
                                Event-sourcing with akka-persistance: growing state as list?
                            
                                How to clean up other resources when spark gets stopped
                            
                                flink job is not distributed across machines
                            
                                Scala: IntelliJ highlights correct code in red
                            
                                Extracting and accessing fields at compile time in Scala 3
                            
                                Is there a Scala version of NavigableMap?
                            
                                Filtering Scala's Parallel Collections with early abort when desired number of results found
                            
                                Connect emacs to a remote ensime server
                            
                                Automating REST API documentation for routes [closed]
                            
                                IntelliJ IDEA debug jumps inside instead of going over
                            
                                How will xml modularisation in scala 2.11 play with xml literals?
                            
                                IntelliJ doesn't seem to pickup certain sbt libraries, no code completion
                            
                                Are cyclic dependencies supported in SBT?
                            
                                specs/scalatest interaction issue in Play app
                            
                                How to show difference between what is given and expected in ScalaTest?
                            
                                How to maintain an immutable list when you impact object linked to each other into this list
                            
                                Shapeless define lens for base trait
                            
                                Server Architecture for hosting Java PLAY application in the cloud
                            
                                Cannot specialize a Scala method with specializable trait as return type
                            
                                Designing numbering system with unsigned and signed ints

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With