Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IndexError: too many indices for array while plotting ROC curve with scikit-learn?

I would like to plott the ROC curve that scikit-lern implements so I tried the following:

from sklearn.metrics import roc_curve, auc
false_positive_rate, recall, thresholds = roc_curve(y_test, prediction[:, 1])
roc_auc = auc(false_positive_rate, recall)
plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, recall, 'b', label='AUC = %0.2f' % roc_auc)
plt.legend(loc='lower right')
plt.plot([0, 1], [0, 1], 'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.ylabel('Recall')
plt.xlabel('Fall-out')
plt.show()

And this is the output:

Traceback (most recent call last):
  File "/Users/user/script.py", line 62, in <module>
    false_positive_rate, recall, thresholds = roc_curve(y_test, prediction[:, 1])
IndexError: too many indices for array

Then from a previous question I tried this:

false_positive_rate, recall, thresholds = roc_curve(y_test, prediction)

And got this traceback:

/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py:705: DeprecationWarning: elementwise comparison failed; this will raise the error in the future.
  not (np.all(classes == [0, 1]) or
/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py:706: DeprecationWarning: elementwise comparison failed; this will raise the error in the future.
  np.all(classes == [-1, 1]) or
Traceback (most recent call last):
  File "/Users/user/PycharmProjects/TESIS_CODE/clasificacion_simple_v1.py", line 62, in <module>
    false_positive_rate, recall, thresholds = roc_curve(y_test, prediction)
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py", line 890, in roc_curve
    y_true, y_score, pos_label=pos_label, sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py", line 710, in _binary_clf_curve
    raise ValueError("Data is not binary and pos_label is not specified")
ValueError: Data is not binary and pos_label is not specified

Then I also tried this:

false_positive_rate, recall, thresholds = roc_curve(y_test, prediction[0].values)

And this is the traceback:

AttributeError: 'numpy.int64' object has no attribute 'values'

Any idea of how to correctly plot this metric?. Thanks in advance!

This is the shape of the prediction varibale:

print prediction.shape
(650,)

this is the shape of testing_matrix: (650, 9596)

like image 758
john doe Avatar asked Nov 09 '22 19:11

john doe


1 Answers

The variable prediction needs to be a 1d array (the same shape as y_test). You can check by inspecting the shape attribute e.g. y_test.shape. I think

prediction[0].values 

returns

AttributeError: 'numpy.int64' object has no attribute 'values'

because you are trying to call .values on an element of prediction.

Update:

ValueError: Data is not binary and pos_label is not specified

I didn't notice this before. If your classes are not binary, you have to specify the pos_label parameter when inroc_curve so it plots one class vs the rest. For this to work you need your class labels to be integers. You can use:

from sklearn.preprocessing import LabelEncoder
class_labels = LabelEncoder()
prediction_le = class_lables.fit_transform(prediction)

pediction_le returns classes recodes a int

Update 2:

Your predictor is only returning one class, so you cannot plot the ROC curve

like image 66
JAB Avatar answered Nov 14 '22 21:11

JAB