Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: multiclass-multioutput format is not supported using sklearn roc_auc_score function

I am using logistic regression for prediction. My predictions are 0's and 1's. After training my model on given data and also when training on important features i.e X_important_train see screenshot. I am getting score around 70% but when I use roc_auc_score(X,y) or roc_auc_score(X_important_train, y_train) I am getting value error: ValueError: multiclass-multioutput format is not supported

Code:

# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score

# Standarize features
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

model.fit(X_important_train, y_train)
model.score(X_important_train, y_train)

roc_auc_score(X_important_train, y_train)

Screenshot:

enter image description here

like image 667
stone rock Avatar asked May 28 '18 12:05

stone rock


1 Answers

First of all, the roc_auc_score function expects input arguments with the same shape.

sklearn.metrics.roc_auc_score(y_true, y_score, average=’macro’, sample_weight=None)

Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format.

y_true : array, shape = [n_samples] or [n_samples, n_classes]
True binary labels in binary label indicators.

y_score : array, shape = [n_samples] or [n_samples, n_classes]
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

Now, the inputs are the true and predicted scores, NOT the training and label data as you are using in the example that you posted. In more detail,

model.fit(X_important_train, y_train)
model.score(X_important_train, y_train)
# this is wrong here
roc_auc_score(X_important_train, y_train)

You should so something like:

y_pred = model.predict(X_test_data)
roc_auc_score(y_true, y_pred)
like image 54
seralouk Avatar answered Nov 15 '22 06:11

seralouk