How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation?

Tags:

I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds.

kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50)

I got the results of the 10 folds

results = model_selection.cross_val_score(model,features,labels, cv=kfold)
print results
[ 0.60666667  0.60333333  0.52333333  0.73        0.75333333  0.72        0.7
  0.73        0.83666667  0.88666667]

I have calculated accuracy by taking mean and standard deviation of the results

print("Accuracy: %.3f%% (%.3f%%)") % (results.mean()*100.0, results.std()*100.0)
Accuracy: 70.900% (10.345%)

I have computed my predictions as follows

predictions = cross_val_predict(model, features,labels ,cv=10)

Since this is an imbalanced dataset,I would like to calculate precision,recall and f1 score of each fold and average the results. How to calculate the values in python?

350

asked Oct 06 '17 04:10

Jayashree

1 Answers

When you use cross_val_score method, you can specify, which scorings you can calculate on each fold:

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

scoring = {'accuracy' : make_scorer(accuracy_score), 
           'precision' : make_scorer(precision_score),
           'recall' : make_scorer(recall_score), 
           'f1_score' : make_scorer(f1_score)}

kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50) 

results = model_selection.cross_val_score(estimator=model,
                                          X=features,
                                          y=labels,
                                          cv=kfold,
                                          scoring=scoring)

After cross validation, you will get results dictionary with keys: 'accuracy', 'precision', 'recall', 'f1_score', which store metrics values on each fold for certain metric. For each metric you can calculate mean and std value by using np.mean(results[value]) and np.std(results[value]), where value - one of your specified metric name.

answered Sep 21 '22 19:09

Eduard Ilyasov

Related questions
                            
                                Where is the connect() method in PyQt5?
                            
                                loop over 2d subplot as if it's a 1-D
                            
                                Get average value from list of dictionary
                            
                                Python 3 urllib.request.urlopen
                            
                                Pandas: can not write to excel file
                            
                                How to freeze brew requirements like pip?
                            
                                Printing File Names
                            
                                Pythonic way of write if open is successful
                            
                                Tensorflow: Word2vec CBOW model
                            
                                Sqlalchemy in_ subquery
                            
                                How to calculate the midpoint of several geolocations in python
                            
                                Sorting the list of dictionaries in descending order of a particular key [duplicate]
                            
                                Change the height of a Seaborn heatmap colorbar
                            
                                Is Python uuid.uuid4 strong enough for password reset links?
                            
                                Inorder Binary Tree Traversal (using Python)
                            
                                Assign values to different index positions in Numpy array
                            
                                Failed to install wsgiref on Python 3
                            
                                Merging Overlapping Intervals
                            
                                using mpatches.Patch for a custom legend
                            
                                Kivy error, [CRITICAL] [Text ] unable to find any valuable text provider (python 3.6.1) (windows 10)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation?

Tags:

python

supervised-learning

scikit-learn

random-forest

cross-validation

Jayashree

People also ask

1 Answers

Eduard Ilyasov

Recent Activity

Donate For Us