I am using my test set as a validation set. I used similar approach as How to compute Receiving Operating Characteristic (ROC) and AUC in keras?
The issue is that my val_auc during the training is around 0.85, how ever, when I use
fpr, tpr, _ = roc_curve(test_label, test_prediction)
roc_auc = auc(fpr, tpr)
I get the auc of 0.60. I understand that they use different formulation and also streaming auc might be different than the one that sklearn calculate. however the difference is very large and I cant figure out what cause this difference.
# define roc_callback, inspired by https://github.com/keras-team/keras/issues/6050#issuecomment-329996505
def auc_roc(y_true, y_pred):
# any tensorflow metric
value, update_op = tf.contrib.metrics.streaming_auc(y_pred, y_true)
# find all variables created for this metric
metric_vars = [i for i in tf.local_variables() if 'auc_roc' in i.name.split('/')[1]]
# Add metric variables to GLOBAL_VARIABLES collection.
# They will be initialized for new session.
for v in metric_vars:
tf.add_to_collection(tf.GraphKeys.GLOBAL_VARIABLES, v)
# force to update metric values
with tf.control_dependencies([update_op]):
value = tf.identity(value)
return value
clf = Sequential()
clf.add(LSTM(units = 128, input_shape = (windowlength, trainX.shape[2]), return_sequences = True))#, kernel_regularizer=regularizers.l2(0.01)))
clf.add(Dropout(0.2))
clf.add(LSTM(units = 64, return_sequences = False))#, kernel_regularizer=regularizers.l2(0.01)))
clf.add(Dropout(0.2))
clf.add(Dense(units = 128, activation = 'relu'))
clf.add(Dropout(0.2))
clf.add(Dense(units = 128, activation = 'relu'))
clf.add(Dense(units = 1, activation = 'sigmoid'))
clf.compile(loss='binary_crossentropy', optimizer = 'adam', metrics = ['acc', auc_roc])
my_callbacks = [EarlyStopping(monitor='auc_roc', patience=50, verbose=1, mode='max')]
clf.fit(trainX, trainY, batch_size = 1000, epochs = 80, class_weight = class_weights, validation_data = (testX, testY),
verbose = 2, callbacks=my_callbacks)
y_pred_pro = model.predict_proba(testX)
print (roc_auc_score(y_test, y_pred_pro))
I really appreciate if anyone can guide me to the right direction.
First of all, tf.contrib.metrics.streaming_auc
is deprecated, use tf.metrics.auc
instead.
As you have mentioned, TF uses a different method to calculate the AUC than Scikit-learn.
TF uses an approximate method. Quoting its documentation:
To discretize the AUC curve, a linearly spaced set of thresholds is used to compute pairs of recall and precision values.
This will almost always give a higher AUC score than the actual score. Furthermore, the thresholds
parameter defaults to 200, which is low if your dataset is large. Increasing it should make the score more accurate, but no matter how much high you set it, it will always have some error.
Scikit-learn, on the other hand, computes the "true" AUC score using a different method.
I don't know exactly why TF uses an approximate method, but I guess because it's much more memory efficient and faster. Also, although it's overestimating the score, it's very likely that it will preserve the relative order of models: if one model has a better approximate AUC than another, then its true AUC will -very probably- be also better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With