AxisError: axis 1 is out of bounds for array of dimension 1 when calculating AUC

Question

I have a classification problem where I have the pixels values of an 8x8 image and the number the image represents and my task is to predict the number('Number' attribute) based on the pixel values using RandomForestClassifier. The values of the number values can be 0-9.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
forest_model.fit(train_df[input_var], train_df[target])
test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovr")

Here it throws an AxisError.

Traceback (most recent call last):
  File "dap_hazi_4.py", line 44, in 
    roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovo")
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 383, in roc_auc_score
    multi_class, average, sample_weight)
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 440, in _multiclass_roc_auc_score
    if not np.allclose(1, y_score.sum(axis=1)):
  File "/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 38, in _sum
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)

AxisError: axis 1 is out of bounds for array of dimension 1

Lalith.Bharadwaj · Accepted Answer

Actually, as your problem is multi-class the labels must be one-hot encoded. When labels are one-hot encoded then the 'multi_class' arguments work. By providing one-hot encoded labels you can resolve the error.

Suppose, you have 100 test labels with 5 unique classes then your matrix size(test label's) must be (100,5) NOT (100,1)

bhola prasad · Answer

The error is due to multi-class problem that you are solving as others suggested. All you need to do is instead of predicting the class, you need to predict the probabilities. I had this same problem before, doing this solves it.

Here is how to do it -

# you might be predicting the class this way
pred = clf.predict(X_valid)

# change it to predict the probabilities which solves the AxisError problem.
pred_prob = clf.predict_proba(X_valid)
roc_auc_score(y_valid, pred_prob, multi_class='ovr')
0.8164900342274142

# shape before
pred.shape
(256,)
pred[:5]
array([1, 2, 1, 1, 2])

# shape after
pred_prob.shape
(256, 3)
pred_prob[:5]
array([[0.  , 1.  , 0.  ],
       [0.02, 0.12, 0.86],
       [0.  , 0.97, 0.03],
       [0.  , 0.8 , 0.2 ],
       [0.  , 0.42, 0.58]])

AxisError: axis 1 is out of bounds for array of dimension 1 when calculating AUC

Tags:

python-3.x

scikit-learn

multiclass-classification

Bálint Béres

2 Answers

Lalith.Bharadwaj

bhola prasad

Recent Activity

Donate For Us

AxisError: axis 1 is out of bounds for array of dimension 1 when calculating AUC

Tags:

python-3.x

scikit-learn

multiclass-classification

Bálint Béres

2 Answers

Lalith.Bharadwaj

bhola prasad

Related questions

Recent Activity

Donate For Us