Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing Arrays for Accuracy

I've a 2 arrays:

np.array(y_pred_list).shape
# returns (5, 47151, 10)
np.array(y_val_lst).shape
# returns (5, 47151, 10)

np.array(y_pred_list)[:, 2, :]
# returns 
array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

np.array(y_val_lst)[:, 2, :]
# returns
array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)

I would like to go through all 47151 examples, and calculate the "accuracy". Meaning the sum of those in y_pred_list that matches y_val_lst over 47151. What's the comparison function for this?

like image 970
Ritchie Avatar asked Oct 03 '16 16:10

Ritchie


2 Answers

You can find a lot of useful classification scores in sklearn.metrics, particularly accuracy_score(). See the doc here, you would use it as:

import sklearn
acc = sklearn.metrics.accuracy_score(np.array(y_val_list)[:, 2, :], 
                                     np.array(y_pred_list)[:, 2, :])
like image 124
P. Camilleri Avatar answered Sep 30 '22 07:09

P. Camilleri


Sounds like you want something like this:

accuracy = (y_pred_list == y_val_lst).all(axis=(0,2)).mean()

...though since your arrays are clearly floating-point arrays, you might want to allow for numerical-precision errors rather than insisting on exact equality:

accuracy = (numpy.abs(y_pred_list - y_val_lst) < tolerance ).all(axis=(0,2)).mean()

(where, for example, tolerance = 1e-10)

The .all(axis=(0,2)) call records cases in which everything in its input is True (i.e. everything matches) when working along the dimension 0 (i.e. the one that has extent 5) and dimension 2 (the one that has extent 10). It outputs a one-dimensional array of length 47151. The .mean() call then gives you the proportion of matches in that sequence, which is my best guess as to what you mean by "over 47151".

like image 33
jez Avatar answered Sep 30 '22 08:09

jez