I have a classification problem where I have the pixels values of an 8x8 image and the number the image represents and my task is to predict the number('Number' attribute) based on the pixel values using RandomForestClassifier. The values of the number values can be 0-9.
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
forest_model.fit(train_df[input_var], train_df[target])
test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovr")
Here it throws an AxisError.
Traceback (most recent call last): File "dap_hazi_4.py", line 44, in roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovo") File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 383, in roc_auc_score multi_class, average, sample_weight) File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 440, in _multiclass_roc_auc_score if not np.allclose(1, y_score.sum(axis=1)): File "/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 38, in _sum return umr_sum(a, axis, dtype, out, keepdims, initial, where) AxisError: axis 1 is out of bounds for array of dimension 1
Actually, as your problem is multi-class the labels must be one-hot encoded. When labels are one-hot encoded then the 'multi_class' arguments work. By providing one-hot encoded labels you can resolve the error.
Suppose, you have 100 test labels with 5 unique classes then your matrix size(test label's) must be (100,5) NOT (100,1)
The error is due to multi-class problem that you are solving as others suggested. All you need to do is instead of predicting the class, you need to predict the probabilities. I had this same problem before, doing this solves it.
Here is how to do it -
# you might be predicting the class this way
pred = clf.predict(X_valid)
# change it to predict the probabilities which solves the AxisError problem.
pred_prob = clf.predict_proba(X_valid)
roc_auc_score(y_valid, pred_prob, multi_class='ovr')
0.8164900342274142
# shape before
pred.shape
(256,)
pred[:5]
array([1, 2, 1, 1, 2])
# shape after
pred_prob.shape
(256, 3)
pred_prob[:5]
array([[0. , 1. , 0. ],
[0.02, 0.12, 0.86],
[0. , 0.97, 0.03],
[0. , 0.8 , 0.2 ],
[0. , 0.42, 0.58]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With