I am training a simple model in keras for NLP task with following code. Variable names are self explanatory for train, test and validation set. This dataset has 19 classes so final layer of the network has 19 outputs. Labels are also one-hot encoded.
nb_classes = 19 model1 = Sequential() model1.add(Embedding(nb_words, EMBEDDING_DIM, weights=[embedding_matrix], input_length=MAX_SEQUENCE_LENGTH, trainable=False)) model1.add(LSTM(num_lstm, dropout=rate_drop_lstm, recurrent_dropout=rate_drop_lstm)) model1.add(Dropout(rate_drop_dense)) model1.add(BatchNormalization()) model1.add(Dense(num_dense, activation=act)) model1.add(Dropout(rate_drop_dense)) model1.add(BatchNormalization()) model1.add(Dense(nb_classes, activation = 'sigmoid')) model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) #One hot encode all labels ytrain_enc = np_utils.to_categorical(train_labels) yval_enc = np_utils.to_categorical(val_labels) ytestenc = np_utils.to_categorical(test_labels) model1.fit(train_data, ytrain_enc, validation_data=(val_data, yval_enc), epochs=200, batch_size=384, shuffle=True, verbose=1)
After first epoch, this gives me these outputs.
Epoch 1/200 216632/216632 [==============================] - 2442s - loss: 0.1427 - acc: 0.9443 - val_loss: 0.0526 - val_acc: 0.9826
Then I evaluate my model on testing dataset and this also shows me accuracy around 0.98.
model1.evaluate(test_data, y = ytestenc, batch_size=384, verbose=1)
However, the labels are one-hot encoded, so I need prediction vector of classes so that I can generate confusion matrix etc. So I use,
PREDICTED_CLASSES = model1.predict_classes(test_data, batch_size=384, verbose=1) temp = sum(test_labels == PREDICTED_CLASSES) temp/len(test_labels) 0.83
This shows that total predicted classes were 83% accurate however model1.evaluate
shows 98% accuracy!! What am I doing wrong here? Is my loss function okay with categorical class labels? Is my choice of sigmoid
activation function for prediction layer okay? or there is difference in the way keras evaluates a model? Please suggest on what can be wrong. This is my first try to make a deep model so I don't have much understanding of what's wrong here.
predict() returns the final output of the model, i.e. answer. While model. evaluate() returns the loss. The loss is used to train the model (via backpropagation) and it is not the answer.
Evaluating models can be streamlined through a couple of simple methods that yield stats that you can reference later. If you've ever read a research paper - you've heard of a model's accuracy, weighted accuracy, recall (sensitivity), specificity, or precision.
Accuracy is a metric used in classification problems used to tell the percentage of accurate predictions. We calculate it by dividing the number of correct predictions by the total number of predictions. This formula provides an easy-to-understand definition that assumes a binary classification problem.
Evaluation is a process during development of the model to check whether the model is best fit for the given problem and corresponding data. Keras model provides a function, evaluate which does the evaluation of the model. It has three main arguments, Test data.
I have found the problem. metrics=['accuracy']
calculates accuracy automatically from cost function. So using binary_crossentropy
shows binary accuracy, not categorical accuracy. Using categorical_crossentropy
automatically switches to categorical accuracy and now it is the same as calculated manually using model1.predict()
. Yu-Yang was right to point out the cost function and activation function for multi-class problem.
P.S: One can get both categorical and binary accuracy by using metrics=['binary_accuracy', 'categorical_accuracy']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With