Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot a ROC curve with Tensorflow and scikit-learn?

I'm trying to plot the ROC curve from a modified version of the CIFAR-10 example provided by tensorflow. It's now for 2 classes instead of 10.

The output of the network are called logits and take the form:

[[-2.57313061 2.57966399] [ 0.04221377 -0.04033273] [-1.42880082 1.43337202] [-2.7692945 2.78173304] [-2.48195744 2.49331546] [ 2.0941515 -2.10268974] [-3.51670194 3.53267646] [-2.74760485 2.75617766] ...]

First of all, what do these logits actually represent? The final layer in the network is a "softmax linear" of form WX+b.

The model is able to calculate accuracy by calling

top_k_op = tf.nn.in_top_k(logits, labels, 1)

Then once the graph has been initialized:

predictions = sess.run([top_k_op])
predictions_int = np.array(predictions).astype(int)
true_count += np.sum(predictions) 
...
precision = true_count / total_sample_count

This works fine.

But now how can I plot a ROC curve from this?

I've been trying the "sklearn.metrics.roc_curve()" function (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) but I don't know what to use as my "y_score" parameter.

Any help would be appreciated!

like image 911
jwsmithers Avatar asked Apr 29 '16 13:04

jwsmithers


People also ask

How do you plot a ROC curve?

To plot the ROC curve, we need to calculate the TPR and FPR for many different thresholds (This step is included in all relevant libraries as scikit-learn ). For each threshold, we plot the FPR value in the x-axis and the TPR value in the y-axis. We then join the dots with a line. That's it!

What is ROC in Sklearn?

Example of Receiver Operating Characteristic (ROC) metric to evaluate classifier output quality. ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis.


1 Answers

'y_score' here should be an array corresponding to the probability of each sample that will be classified as positive (if positive was labeled as 1 in your y_true array)

Actually, if your network use Softmax as the last layer, then the model should output the probability of each category for this instance. But the data you given here doesn't conform with this format. And I checked the example code : https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/models/image/cifar10/cifar10.py it seems use the layer called softmax_linear, I know little for this Example but I guess you should process the output with something like Logistic Function to turn it into the probability.

Then just feed it along with your true label 'y_true' to the scikit-learn function:

y_score = np.array(output)[:,1]
roc_curve(y_true, y_score)
like image 98
fin Avatar answered Oct 19 '22 22:10

fin