How do I calculate the matthews correlation coefficient in tensorflow

Tags:

So I made a model with tensorflow keras and it seems to work ok. However, my supervisor said it would be useful to calculate the Matthews correlation coefficient, as well as the accuracy and loss it already calculates.

my model is very similar to the code in the tutorial here (https://www.tensorflow.org/tutorials/keras/basic_classification) except with a much smaller dataset.

is there a prebuilt function or would I have to get the prediction for each test and calculate it by hand?

296

asked Jul 03 '19 07:07

Toby Peterken

2 Answers

There is nothing out of the box but we can calculate it from the formula in a custom metric.

The basic classification link you supplied is for a multi-class categorisation problem whereas the Matthews Correlation Coefficient is specifically for binary classification problems.

Assuming your model is structured in the "normal" way for such problems (i.e. y_pred is a number between 0 and 1 for each record representing predicted probability of a "True" and labels are each exactly a 0 or 1 representing ground truth "False" and "True" respectively) then we can add in an MCC metric as follows:

# if y_pred > threshold we predict true. 
# Sometimes we set this to something different to 0.5 if we have unbalanced categories

threshold = 0.5  

def mcc_metric(y_true, y_pred):
  predicted = tf.cast(tf.greater(y_pred, threshold), tf.float32)
  true_pos = tf.math.count_nonzero(predicted * y_true)
  true_neg = tf.math.count_nonzero((predicted - 1) * (y_true - 1))
  false_pos = tf.math.count_nonzero(predicted * (y_true - 1))
  false_neg = tf.math.count_nonzero((predicted - 1) * y_true)
  x = tf.cast((true_pos + false_pos) * (true_pos + false_neg) 
      * (true_neg + false_pos) * (true_neg + false_neg), tf.float32)
  return tf.cast((true_pos * true_neg) - (false_pos * false_neg), tf.float32) / tf.sqrt(x)

which we can include in our model.compile call:

model.compile(optimizer='adam',
              loss=tf.keras.losses.binary_crossentropy,
              metrics=['accuracy', mcc_metric])

Example

Here is a complete worked example where we categorise mnist digits depending on whether they are greater than 4:

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
y_train, y_test = 0 + (y_train > 4), 0 + (y_test > 4)

def mcc_metric(y_true, y_pred):
  predicted = tf.cast(tf.greater(y_pred, 0.5), tf.float32)
  true_pos = tf.math.count_nonzero(predicted * y_true)
  true_neg = tf.math.count_nonzero((predicted - 1) * (y_true - 1))
  false_pos = tf.math.count_nonzero(predicted * (y_true - 1))
  false_neg = tf.math.count_nonzero((predicted - 1) * y_true)
  x = tf.cast((true_pos + false_pos) * (true_pos + false_neg) 
      * (true_neg + false_pos) * (true_neg + false_neg), tf.float32)
  return tf.cast((true_pos * true_neg) - (false_pos * false_neg), tf.float32) / tf.sqrt(x)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='relu'),
  tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.binary_crossentropy,
              metrics=['accuracy', mcc_metric])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

output:

Epoch 1/5
60000/60000 [==============================] - 7s 113us/sample - loss: 0.1391 - acc: 0.9483 - mcc_metric: 0.8972
Epoch 2/5
60000/60000 [==============================] - 6s 96us/sample - loss: 0.0722 - acc: 0.9747 - mcc_metric: 0.9495
Epoch 3/5
60000/60000 [==============================] - 6s 97us/sample - loss: 0.0576 - acc: 0.9797 - mcc_metric: 0.9594
Epoch 4/5
60000/60000 [==============================] - 6s 96us/sample - loss: 0.0479 - acc: 0.9837 - mcc_metric: 0.9674
Epoch 5/5
60000/60000 [==============================] - 6s 95us/sample - loss: 0.0423 - acc: 0.9852 - mcc_metric: 0.9704
10000/10000 [==============================] - 1s 58us/sample - loss: 0.0582 - acc: 0.9818 - mcc_metric: 0.9639
[0.05817381642502733, 0.9818, 0.9638971]

195

answered Nov 03 '22 00:11

Stewart_R

Since the asker accepted a Python version from sklearn, here is Stewart_Rs answer in pure Python:

from math import sqrt
def mcc(tp, fp, tn, fn):

    # https://stackoverflow.com/a/56875660/992687
    x = (tp + fp) * (tp + fn) * (tn + fp) * (tn + fn)
    return ((tp * tn) - (fp * fn)) / sqrt(x)

It has the advanatage of being general, not just for evaluating binary classifications.

answered Nov 03 '22 01:11

The Unfun Cat

Related questions
                            
                                Python: Why operator "is" and "==" are sometimes interchangeable for strings? [duplicate]
                            
                                Python - get slice index
                            
                                django tutorials: 500 @ debug=false
                            
                                Why does dict(k=4, z=2).update(dict(l=1)) return None in Python?
                            
                                How to parse date days that contain "st", "nd", "rd", or "th"?
                            
                                How to add a character to the end of every string in a list? [duplicate]
                            
                                mysql.connector, multi=True, sql variable assignment not working
                            
                                print UTF-8 character in Python 2.7
                            
                                Bottle loading time for network server is extremely slow
                            
                                Do python "in" statements automatically return as true
                            
                                Is list join really faster than string concatenation in python?
                            
                                Getting the Max Value from a Dictionary [duplicate]
                            
                                Append binary file to another binary file
                            
                                Peek the number of rows in an hdf5 file in pandas
                            
                                Get info string from scapy packet
                            
                                Django 1.9 Compiling Error
                            
                                Check string "None" or "not" in Python 2.7
                            
                                How to Use a Wildcard (%) in Pandas read_sql()
                            
                                Pandas: Filling NA values to be filled based on distribution of existing values
                            
                                Use mapped() in odoo 10

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I calculate the matthews correlation coefficient in tensorflow

Tags:

machine-learning

tensorflow

python-2.7

tf.keras