Get same value for precision, recall and F score in Apache Spark Logistic regression algorithm

Question

I have implemented a logistic regression for a classification problem. I get the same value for precision, recall and F1 score. Is it ok to have the same value? I also got this problem in implementing decision trees and random forest. There also I got same value for precision, recall and F1 score.

// Run training algorithm to build the model.
        final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
                .setNumClasses(13).
                run(data.rdd());
//Compute raw scores on the test set.
        JavaRDD<Tuple2<Object, Object>> predictionAndLabels = testData.map(
                new Function<LabeledPoint, Tuple2<Object, Object>>() {
                    public Tuple2<Object, Object> call(LabeledPoint p) {
                        Double prediction = model.predict(p.features());
                        return new Tuple2<Object, Object>(prediction, p.label());
                    }
                }
        );
// Get evaluation metrics.
        MulticlassMetrics metrics = new MulticlassMetrics(predictionAndLabels.rdd());
        double precision = metrics.precision();
        System.out.println("Precision = " + precision);

        double recall = metrics.recall();
        System.out.println("Recall = " + recall);

        double FScore = metrics.fMeasure();
        System.out.println("F Measure = " + FScore);

Avinash · Accepted Answer

I am also facing the same problem. I have tried decision tree, random forest and GBT. Every time, I get the same precision, recall and F1 score. The accuracy is also the same (calculated through confusion matrix).

So, I am using my own formulas and written code to get the accuracy, precision, recall, and F1 score measures.

from pyspark.ml.classification import RandomForestClassifier
from pyspark.mllib.evaluation import MulticlassMetrics

#generate model on splited dataset
rf = RandomForestClassifier(labelCol='label', featuresCol='features')
fit = rf.fit(trainingData)
transformed = fit.transform(testData)

results = transformed.select(['prediction', 'label'])
predictionAndLabels=results.rdd
metrics = MulticlassMetrics(predictionAndLabels)

cm=metrics.confusionMatrix().toArray()
accuracy=(cm[0][0]+cm[1][1])/cm.sum()
precision=(cm[0][0])/(cm[0][0]+cm[1][0])
recall=(cm[0][0])/(cm[0][0]+cm[0][1])`
print("RandomForestClassifier: accuracy,precision,recall",accuracy,precision,recall)

user25260 · Answer

You can give label=1 as an argument in precision and recall methods for binary classification. It worked for me. For multiple classification, you can try the label index of the class for which you calculate precision and recall values.

`double precision = metrics.precision(label=1);
 System.out.println("Precision = " + precision);
 double recall = metrics.recall(label=1);
 System.out.println("Recall = " + recall);
 double FScore = metrics.fMeasure();
 System.out.println("F Measure = " + FScore);`

Get same value for precision, recall and F score in Apache Spark Logistic regression algorithm

Tags:

apache-spark

performance-measuring

Thamali Wijewardhana

2 Answers

Avinash

user25260

Recent Activity

Donate For Us

Get same value for precision, recall and F score in Apache Spark Logistic regression algorithm

Tags:

apache-spark

performance-measuring

Thamali Wijewardhana

2 Answers

Avinash

user25260

Related questions

Recent Activity

Donate For Us