Which accuracy score to use for the Mean Decrease Accuracy with the scikit RandomForestClassifier

Question

I've been running the implementation the 'Mean Decrease Accuracy' measure that is shown on this website:

In the example the author is using the random forest regressor RandomForestRegressor, but I am using the random forest classifier RandomForestClassifier. Thus, my question is, if I should also use the r2_score for measuring accuracy or if I should switch to classic accuracy accuracy_score or matthews correlation coefficient matthews_corrcoef?.

Does anybody here if I should switch or not. And why?

Thanks for any help!

Here is the code from the website in case you are too lazy to click :)

from sklearn.cross_validation import ShuffleSplit
from sklearn.metrics import r2_score
from collections import defaultdict

X = boston["data"]
Y = boston["target"]

rf = RandomForestRegressor()
scores = defaultdict(list)

#crossvalidate the scores on a number of different random splits of the data
for train_idx, test_idx in ShuffleSplit(len(X), 100, .3):
    X_train, X_test = X[train_idx], X[test_idx]
    Y_train, Y_test = Y[train_idx], Y[test_idx]
    r = rf.fit(X_train, Y_train)
    acc = r2_score(Y_test, rf.predict(X_test))
    for i in range(X.shape[1]):
        X_t = X_test.copy()
        np.random.shuffle(X_t[:, i])
        shuff_acc = r2_score(Y_test, rf.predict(X_t))
        scores[names[i]].append((acc-shuff_acc)/acc)
print "Features sorted by their score:"
print sorted([(round(np.mean(score), 4), feat) for
              feat, score in scores.items()], reverse=True)

Jianxun Li · Accepted Answer

r2_score is for regression (continuous response variable), whereas classic classification (discrete categorical variable) metrics such like accuracy_score and f1_score roc_auc (the last two are most appropriate if you have unbalanced y-labels) are right choices for your task.

Random shuffling each features in the input data matrix and measuring the decline in these classification metrics sounds like a valid approach to rank feature importances.

Which accuracy score to use for the Mean Decrease Accuracy with the scikit RandomForestClassifier

Tags:

python

machine-learning

statistics

classification

scikit-learn

dmeu

1 Answers

Jianxun Li

Recent Activity

Donate For Us

Which accuracy score to use for the Mean Decrease Accuracy with the scikit RandomForestClassifier

Tags:

python

machine-learning

statistics

classification

scikit-learn

dmeu

1 Answers

Jianxun Li

Related questions

Recent Activity

Donate For Us