Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matthews Correlation Coefficient with Keras

I have a Keras model (Sequential) in Python 3:

class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.matthews_correlation = []

    def on_epoch_end(self, batch, logs={}):
        self.matthews_correlation.append(logs.get('matthews_correlation'))
...    
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['matthews_correlation'])
history = LossHistory()
model.fit(Xtrain, Ytrain, nb_epoch=10, batch_size=10, callbacks=[history])
scores = model.evaluate(Xtest, Ytest, verbose=1)

...
MCC = matthews_correlation(Ytest, predictions)

The model.fit() prints out - supposedly according to metrics = ['matthews_correlation'] part - progress and a Matthews Correlation Coefficient (MCC). But they are rather different from what MCC in the end gives back. The MCC function in the end gives the overall MCC of the prediction and is consistent with the MCC function of sklearn (i.e. I trust the value).

1) What are the scores from model.evaluate()? They are totally different from the MCC in the end or the MCCs of the epochs.

2) What are the MCCs from the epochs? It looks like this:

Epoch 1/10 580/580 [===========] - 0s - loss: 0.2500 - matthews_correlation: -0.5817

How are they calculated and why do they differ so much from the MCC in the very end?

3) Can I somehow add the function matthews_correlation() to the function on_epoch_train()? Then I could print out the MCC independently calculated. I don't know what Keras implicitly does.

Thanks for your help.

Edit: Here is an example how they record a history of loss. If I print(history.matthews_correlation), I get a list of the same MCCs that the progress report gives me.

like image 362
ste Avatar asked Oct 06 '16 12:10

ste


People also ask

How do you find the Matthew correlation coefficient in Python?

To calculate the MCC of the model, we can use the following formula: MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN) MCC = (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5) MCC = 0.7368.

How do you interpret Matthews Correlation Coefficient?

The Matthews paper (www.sciencedirect.com/science/article/pii/0005279575901099) describes the following: "A correlation of: C = 1 indicates perfect agreement, C = 0 is expected for a prediction no better than random, and C = -1 indicates total disagreement between prediction and observation"`.

What is a good Matthews correlation?

A Matthews correlation coefficient close to +1, in fact, means having high values for all the other confusion matrix metrics. The same cannot be said for balanced accuracy, markedness, bookmaker informedness, accuracy and F1 score.


1 Answers

The reason your MCC is negative might be due to a bug recently fixed in Keras implementation. Check this issue.

The solution to your problem could be to reinstall Keras from GitHub master branch or to write your own callback (as described here) as fixed in the issue:

import keras.backend as K
def matthews_correlation(y_true, y_pred):
    y_pred_pos = K.round(K.clip(y_pred, 0, 1))
    y_pred_neg = 1 - y_pred_pos

    y_pos = K.round(K.clip(y_true, 0, 1))
    y_neg = 1 - y_pos

    tp = K.sum(y_pos * y_pred_pos)
    tn = K.sum(y_neg * y_pred_neg)

    fp = K.sum(y_neg * y_pred_pos)
    fn = K.sum(y_pos * y_pred_neg)

    numerator = (tp * tn - fp * fn)
    denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))

    return numerator / (denominator + K.epsilon())
like image 88
Matt07 Avatar answered Nov 15 '22 17:11

Matt07