Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Library in python for neural networks to plot ROC, AUC, DET [closed]

I am new to machine learning in python, therefore forgive my naive question. Is there a library in python for implementing neural networks, such that it gives me the ROC and AUC curves also. I know about libraries in python which implement neural networks but I am searching for a library which also helps me in plotting ROC, DET and AUC curves.

like image 976
user1354510 Avatar asked Apr 24 '12 18:04

user1354510


2 Answers

In this case it makes sense to divide your question in 2 topics, since neural networks are hardly directly related to ROC curves.

Neural Networks

I think there's nothing better to learn by example, so I'll show you an approach to your problem using a binary classification problem trained by a Feed-Forward neural network, and inspired by this tutorial from pybrain.

First thing is to define a dataset. The easiest way to visualize is to use a binary dataset on a 2D plane, with points generated from normal distributions, each of them belonging to one of the 2 classes. This will be linearly separable in this case.

from pybrain.datasets            import ClassificationDataSet
from pybrain.utilities           import percentError
from pybrain.tools.shortcuts     import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules   import SoftmaxLayer

from pylab import ion, ioff, figure, draw, contourf, clf, show, hold, plot
from scipy import diag, arange, meshgrid, where
from numpy.random import multivariate_normal

means = [(-1,0),(2,4),(3,1)]
cov = [diag([1,1]), diag([0.5,1.2]), diag([1.5,0.7])]
n_klass = 2
alldata = ClassificationDataSet(2, 1, nb_classes=n_klass)
for n in xrange(400):
    for klass in range(n_klass):
        input = multivariate_normal(means[klass],cov[klass])
        alldata.addSample(input, [klass])

To visualize, it looks something like this: dataset

Now you want to split it into training and test set:

tstdata, trndata = alldata.splitWithProportion(0.25)

trndata._convertToOneOfMany()
tstdata._convertToOneOfMany()

And to create your network:

fnn = buildNetwork( trndata.indim, 5, trndata.outdim, outclass=SoftmaxLayer )

trainer = BackpropTrainer( fnn, dataset=trndata, momentum=0.1, verbose=True,             weightdecay=0.01)

ticks = arange(-3.,6.,0.2)
X, Y = meshgrid(ticks, ticks)
# need column vectors in dataset, not arrays
griddata = ClassificationDataSet(2,1, nb_classes=n_klass)
for i in xrange(X.size):
    griddata.addSample([X.ravel()[i],Y.ravel()[i]], [0])
griddata._convertToOneOfMany()  # this is still needed to make the fnn feel comfy

Now you need to train your network and see what results you get in the end:

for i in range(20):
    trainer.trainEpochs( 1 )
    trnresult = percentError( trainer.testOnClassData(),
                              trndata['class'] )
    tstresult = percentError( trainer.testOnClassData(
           dataset=tstdata ), tstdata['class'] )

    print "epoch: %4d" % trainer.totalepochs, \
          "  train error: %5.2f%%" % trnresult, \
          "  test error: %5.2f%%" % tstresult

    out = fnn.activateOnDataset(griddata)
    out = out.argmax(axis=1)  # the highest output activation gives the class
    out = out.reshape(X.shape)

    figure(1)
    ioff()  # interactive graphics off
    clf()   # clear the plot
    hold(True) # overplot on
    for c in range(n_klass):
        here, _ = where(tstdata['class']==c)
        plot(tstdata['input'][here,0],tstdata['input'][here,1],'o')
    if out.max()!=out.min():  # safety check against flat field
        contourf(X, Y, out)   # plot the contour
    ion()   # interactive graphics on
    draw()  # update the plot

Which gives you a very bad boundary at the beginning: train-start

But in the end a pretty good result:

train-end

ROC curves

As for ROC curves, here is a nice and simple Python library to do it on a random toy problem:

from pyroc import *
random_sample  = random_mixture_model()  # Generate a custom set randomly

#Example instance labels (first index) with the decision function , score (second index)
#-- positive class should be +1 and negative 0.
roc = ROCData(random_sample)  #Create the ROC Object
roc.auc() #get the area under the curve
roc.plot(title='ROC Curve') #Create a plot of the ROC curve

Which gives you a single ROC curve: ROC-single

Of course you can also plot multiple ROC curves on the same graph:

x = random_mixture_model()
r1 = ROCData(x)
y = random_mixture_model()
r2 = ROCData(y)
lista = [r1,r2]
plot_multiple_roc(lista,'Multiple ROC Curves',include_baseline=True)

ROC-multiple

(remember that the diagonal just means that your classifier is random and that you're probably doing something wrong)

You can probably easily use your modules in any of your classification tasks (not limited to neural networks) and it will produce ROC curves for you.

Now to get the class/probability needed to plot your ROC curve from your neural network, you just need to look at the activation of your neural network: activateOnDataset in pybrain will give you the probability for both classes (in my example above we just take the max of probabilities to determine which class to consider). From there, just transform it to the format expected by PyROC like for random_mixture_model and it should give you your ROC curve.

like image 57
Charles Menguy Avatar answered Oct 19 '22 03:10

Charles Menguy


Sure. First, check out this

https://stackoverflow.com/questions/2276933/good-open-source-neural-network-python-library

This is my general idea, I'm sketching out how I might approach this, none of this is tested

From http://pybrain.org/docs/tutorial/netmodcon.html#feed-forward-networks

>>> from pybrain.structure import FeedForwardNetwork
>>> n = FeedForwardNetwork()
>>> n.activate((2, 2))
array([-0.1959887])

We build a neural net, train it (not shown) and get the output. You have a test set, right? You use the test set to generate the data for the ROC curve. For a single output neural net, you want to create a threshold for the output values to translate them to yes or no responses that get the best degree of specificity/sensitivity for your task

This is a good tutorial http://webhome.cs.uvic.ca/~mgbarsky/DM_LABS/LAB_5/Lab5_ROC_weka.pdf

Then you just plot them. Or you can try to find a library that does it for you

I saw this http://pypi.python.org/pypi/yard

The point is, that generating at ROC curve is not specific to neural nets, so you may not find a library that does it for you. I've provided the above to show it's fairly simple to roll your own

* More detail *

Your neural network is going to have an output that you will have to translate in to a classification (likely yes/no). To calculate the ROC curve, you're going to take a few thresholds for yes/no (in other words, .75> yes, <.75 no). From this threshold, you translate the output of your neural net into classifications. By comparing those classifications to the true classifications, you get a false positive and true positive rate. You are then plotting the false positive rate and true positive rate when you tweak that threshold.

like image 28
dfb Avatar answered Oct 19 '22 03:10

dfb