I've a 2 arrays:
np.array(y_pred_list).shape
# returns (5, 47151, 10)
np.array(y_val_lst).shape
# returns (5, 47151, 10)
np.array(y_pred_list)[:, 2, :]
# returns
array([[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
np.array(y_val_lst)[:, 2, :]
# returns
array([[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
I would like to go through all 47151 examples, and calculate the "accuracy". Meaning the sum of those in y_pred_list that matches y_val_lst over 47151. What's the comparison function for this?
You can find a lot of useful classification scores in sklearn.metrics
, particularly accuracy_score()
. See the doc here, you would use it as:
import sklearn
acc = sklearn.metrics.accuracy_score(np.array(y_val_list)[:, 2, :],
np.array(y_pred_list)[:, 2, :])
Sounds like you want something like this:
accuracy = (y_pred_list == y_val_lst).all(axis=(0,2)).mean()
...though since your arrays are clearly floating-point arrays, you might want to allow for numerical-precision errors rather than insisting on exact equality:
accuracy = (numpy.abs(y_pred_list - y_val_lst) < tolerance ).all(axis=(0,2)).mean()
(where, for example, tolerance = 1e-10
)
The .all(axis=(0,2))
call records cases in which everything in its input is True
(i.e. everything matches) when working along the dimension 0 (i.e. the one that has extent 5) and dimension 2 (the one that has extent 10). It outputs a one-dimensional array of length 47151. The .mean()
call then gives you the proportion of matches in that sequence, which is my best guess as to what you mean by "over 47151".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With