Comparing Arrays for Accuracy

Question

I've a 2 arrays:

np.array(y_pred_list).shape
# returns (5, 47151, 10)
np.array(y_val_lst).shape
# returns (5, 47151, 10)

np.array(y_pred_list)[:, 2, :]
# returns 
array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

np.array(y_val_lst)[:, 2, :]
# returns
array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)

I would like to go through all 47151 examples, and calculate the "accuracy". Meaning the sum of those in y_pred_list that matches y_val_lst over 47151. What's the comparison function for this?

P. Camilleri · Accepted Answer

You can find a lot of useful classification scores in sklearn.metrics, particularly accuracy_score(). See the doc here, you would use it as:

import sklearn
acc = sklearn.metrics.accuracy_score(np.array(y_val_list)[:, 2, :], 
                                     np.array(y_pred_list)[:, 2, :])

jez · Answer

Sounds like you want something like this:

accuracy = (y_pred_list == y_val_lst).all(axis=(0,2)).mean()

...though since your arrays are clearly floating-point arrays, you might want to allow for numerical-precision errors rather than insisting on exact equality:

accuracy = (numpy.abs(y_pred_list - y_val_lst) < tolerance ).all(axis=(0,2)).mean()

(where, for example, tolerance = 1e-10)

The .all(axis=(0,2)) call records cases in which everything in its input is True (i.e. everything matches) when working along the dimension 0 (i.e. the one that has extent 5) and dimension 2 (the one that has extent 10). It outputs a one-dimensional array of length 47151. The .mean() call then gives you the proportion of matches in that sequence, which is my best guess as to what you mean by "over 47151".

Comparing Arrays for Accuracy

Tags:

python

arrays

numpy

Ritchie

2 Answers

P. Camilleri

jez

Recent Activity

Donate For Us

Comparing Arrays for Accuracy

Tags:

python

arrays

numpy

Ritchie

2 Answers

P. Camilleri

jez

Related questions

Recent Activity

Donate For Us