How to get average score of K-Fold cross validation with sklearn

Question

I apply decision tree with K-fold using sklearn and someone can help me to show the average score of it. Below is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix,classification_report

dta=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/blood-transfusion/transfusion.data")

X=dta.drop("whether he/she donated blood in March 2007",axis=1)

X=X.values # convert dataframe to numpy array

y=dta["whether he/she donated blood in March 2007"]

y=y.values # convert dataframe to numpy array

kf = KFold(n_splits=10)

KFold(n_splits=10, random_state=None, shuffle=False)

clf_tree=DecisionTreeClassifier()

for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf=clf_tree.fit(X_train,y_train)
    print("classification_report_tree", 
           classification_report(y_test,clf_tree.predict(X_test)))

Vivek Kumar · Accepted Answer

If you only want accuracy, then you can simply use cross_val_score()

kf = KFold(n_splits=10)
clf_tree=DecisionTreeClassifier()
scores = cross_val_score(clf_tree, X, y, cv=kf)

avg_score = np.mean(score_array)
print(avg_score)

Here cross_val_score will take as input your original X and y (without splitting into train and test). cross_val_score will automatically split them into train and test, fit the model on train data and score on test data. And those scores will be returned in the scores variable.

So when you have 10 folds, 10 scores will be returned in scores variable. You can then just take an average of that.

Gambit1614 · Answer

You can try Precision_reacll_fscore_support metric from sklearn and then get average the results for each fold per class. I am assuming here that you need the scores average per class.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_recall_fscore_support
from sklearn.model_selection import GridSearchCV,cross_val_score

dta=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/blood-transfusion/transfusion.data")

X=dta.drop("whether he/she donated blood in March 2007",axis=1)

X=X.values # convert dataframe to numpy array

y=dta["whether he/she donated blood in March 2007"]

y=y.values # convert dataframe to numpy array

kf = KFold(n_splits=10)

KFold(n_splits=10, random_state=None, shuffle=False)

clf_tree=DecisionTreeClassifier()

score_array =[]
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf=clf_tree.fit(X_train,y_train)
    y_pred = clf.predict(X_test)
    score_array.append(precision_recall_fscore_support(y_test, y_pred, average=None))

avg_score = np.mean(score_array,axis=0)
print(avg_score)

#Output:
#[[  0.77302466   0.30042282]
# [  0.81755068   0.22192344]
# [  0.79063779   0.24414489]
# [ 57.          17.8       ]]

Now to get precision of class 0, you can use avg_score[0][0]. The recall can be accessed by the second row (i.e. for class 0, it is avg_score[1][0]), while the fscore and support can be accessed from 3rd and 4th row respectively.

How to get average score of K-Fold cross validation with sklearn

Tags:

scikit-learn

cross-validation

Ngọc Vũ Đình

2 Answers

Vivek Kumar

Gambit1614

Recent Activity

Donate For Us

How to get average score of K-Fold cross validation with sklearn

Tags:

scikit-learn

cross-validation

Ngọc Vũ Đình

2 Answers

Vivek Kumar

Gambit1614

Related questions

Recent Activity

Donate For Us