I have implemented a logistic regression for a classification problem. I get the same value for precision, recall and F1 score. Is it ok to have the same value? I also got this problem in implementing decision trees and random forest. There also I got same value for precision, recall and F1 score.
// Run training algorithm to build the model.
final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
.setNumClasses(13).
run(data.rdd());
//Compute raw scores on the test set.
JavaRDD<Tuple2<Object, Object>> predictionAndLabels = testData.map(
new Function<LabeledPoint, Tuple2<Object, Object>>() {
public Tuple2<Object, Object> call(LabeledPoint p) {
Double prediction = model.predict(p.features());
return new Tuple2<Object, Object>(prediction, p.label());
}
}
);
// Get evaluation metrics.
MulticlassMetrics metrics = new MulticlassMetrics(predictionAndLabels.rdd());
double precision = metrics.precision();
System.out.println("Precision = " + precision);
double recall = metrics.recall();
System.out.println("Recall = " + recall);
double FScore = metrics.fMeasure();
System.out.println("F Measure = " + FScore);
I am also facing the same problem. I have tried decision tree, random forest and GBT. Every time, I get the same precision, recall and F1 score. The accuracy is also the same (calculated through confusion matrix).
So, I am using my own formulas and written code to get the accuracy, precision, recall, and F1 score measures.
from pyspark.ml.classification import RandomForestClassifier
from pyspark.mllib.evaluation import MulticlassMetrics
#generate model on splited dataset
rf = RandomForestClassifier(labelCol='label', featuresCol='features')
fit = rf.fit(trainingData)
transformed = fit.transform(testData)
results = transformed.select(['prediction', 'label'])
predictionAndLabels=results.rdd
metrics = MulticlassMetrics(predictionAndLabels)
cm=metrics.confusionMatrix().toArray()
accuracy=(cm[0][0]+cm[1][1])/cm.sum()
precision=(cm[0][0])/(cm[0][0]+cm[1][0])
recall=(cm[0][0])/(cm[0][0]+cm[0][1])`
print("RandomForestClassifier: accuracy,precision,recall",accuracy,precision,recall)
You can give label=1 as an argument in precision and recall methods for binary classification. It worked for me. For multiple classification, you can try the label index of the class for which you calculate precision and recall values.
`double precision = metrics.precision(label=1);
System.out.println("Precision = " + precision);
double recall = metrics.recall(label=1);
System.out.println("Recall = " + recall);
double FScore = metrics.fMeasure();
System.out.println("F Measure = " + FScore);`
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With