I have been running this LSTM tutorial on the wikigold.conll NER data set
training_data
contains a list of tuples of sequences and tags, for example:
training_data = [
("They also have a song called \" wake up \"".split(), ["O", "O", "O", "O", "O", "O", "I-MISC", "I-MISC", "I-MISC", "I-MISC"]),
("Major General John C. Scheidt Jr.".split(), ["O", "O", "I-PER", "I-PER", "I-PER"])
]
And I wrote down this function
def predict(indices):
"""Gets a list of indices of training_data, and returns a list of predicted lists of tags"""
for index in indicies:
inputs = prepare_sequence(training_data[index][0], word_to_ix)
tag_scores = model(inputs)
values, target = torch.max(tag_scores, 1)
yield target
This way I can get the predicted labels for specific indices in the training data.
However, how do I evaluate the accuracy score across all training data.
Accuracy being, the amount of words correctly classified across all sentences divided by the word count.
y_pred = list(predict([s for s, t in training_data]))
y_true = [t for s, t in training_data]
c=0
s=0
for i in range(len(training_data)):
n = len(y_true[i])
#super ugly and ineffiicient
s+=(sum(sum(list(y_true[i].view(-1, n) == y_pred[i].view(-1, n).data))))
c+=n
print ('Training accuracy:{a}'.format(a=float(s)/c))
P.S: I've been trying to use sklearn's accuracy_score unsuccessfully
If you would like to calculate the loss for each epoch, divide the running_loss by the number of batches and append it to train_losses in each epoch. Accuracy is the number of correct classifications / the total amount of classifications.
Gates can be viewed as combinations of neural network layers and pointwise operations. If you don't already know how LSTMs work, the maths is straightforward and the fundamental LSTM equations are available in the Pytorch docs. There are many great resources online, such as this one.
I would use numpy
in order to not iterate the list in pure python.
The results are the same, but it runs much faster
def accuracy_score(y_true, y_pred):
y_pred = np.concatenate(tuple(y_pred))
y_true = np.concatenate(tuple([[t for t in y] for y in y_true])).reshape(y_pred.shape)
return (y_true == y_pred).sum() / float(len(y_true))
And this is how to use it:
#original code:
y_pred = list(predict([s for s, t in training_data]))
y_true = [t for s, t in training_data]
#numpy accuracy score
print(accuracy_score(y_true, y_pred))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With