How to plot ROC curve and precision-recall curve from BinaryClassificationMetrics

Tags:

I was trying to plot ROC curve and Precision-Recall curve in graph. The points are generated from the Spark Mllib BinaryClassificationMetrics. By following the following Spark https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

[(1.0,1.0), (0.0,0.4444444444444444)] Precision
[(1.0,1.0), (0.0,1.0)] Recall
[(1.0,1.0), (0.0,0.6153846153846153)] - F1Measure    
[(0.0,1.0), (1.0,1.0), (1.0,0.4444444444444444)]- Precision-Recall curve
[(0.0,0.0), (0.0,1.0), (1.0,1.0), (1.0,1.0)] - ROC curve

951

asked Jul 05 '16 16:07

Desanth pv

1 Answers

It looks like you have a similar problem to what I experienced. You need to either flip your parameters to the Metrics constructor or perhaps pass in the probability instead of the prediction. So, for example, if you are using the BinaryClassificationMetrics and a RandomForestClassifier, then according to this page (under outputs) there is "prediction" and "probability".

Then initialize your Metrics thus:

    new BinaryClassificationMetrics(predictionsWithResponse
      .select(col("probability"),col("myLabel"))
      .rdd.map(r=>(r.getAs[DenseVector](0)(1),r.getDouble(1))))

With the DenseVector call used to extract the probability of the 1 class.

As for actual plotting, that's up to you (many fine tools for that), but at least you will get more than 1 point on you curve (besides the endpoints).

And in case it's not clear:

metrics.roc().collect() will give you the data for the ROC curve: Tuples of: (false positive rate, true positive rate).

153

answered Oct 11 '22 19:10

Jeremy

Related questions
                            
                                Scala Spark: Split collection into several RDD?
                            
                                Spark Python Performance Tuning
                            
                                How to create multiple SparkContexts in a console
                            
                                PySpark error: "Input path does not exist"
                            
                                Remotely execute a Spark job on an HDInsight cluster
                            
                                Periodic Broadcast in Apache Spark Streaming
                            
                                unable to add spark to PYTHONPATH
                            
                                java.lang.ClassNotFoundException,when I use "spark-submit" with a new class name rather than "SimpleApp",
                            
                                Programmatically determine number of cores and amount of memory available to Spark
                            
                                Is it possible for multiple Executors to be launched within a single Spark worker for one Spark Application?
                            
                                How to Access RDD Tables via Spark SQL as a JDBC Distributed Query Engine?
                            
                                How to create a graph from Array[(Any, Any)] using Graph.fromEdgeTuples
                            
                                get size of parquet file in HDFS for repartition with Spark in Scala
                            
                                Spark on Java - What is the right way to have a static object on all workers
                            
                                DataFrame explode list of JSON objects
                            
                                EMR spark-shell not picking up jars
                            
                                What happens if the data can't fit in memory with cache() in Spark?
                            
                                Memory issue when importing parquet files in Spark
                            
                                Is it possible to obtain specific message offset in Kafka+SparkStreaming?
                            
                                OneHotEncoder in Spark Dataframe in Pipeline

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to plot ROC curve and precision-recall curve from BinaryClassificationMetrics

Tags:

machine-learning

apache-spark

apache-spark-mllib

Desanth pv

People also ask

1 Answers

Jeremy

Recent Activity

Donate For Us