I am new to machine learning in python, therefore forgive my naive question. Is there a library in python for implementing neural networks, such that it gives me the ROC and AUC curves also. I know about libraries in python which implement neural networks but I am searching for a library which also helps me in plotting ROC, DET and AUC curves.
In this case it makes sense to divide your question in 2 topics, since neural networks are hardly directly related to ROC curves.
I think there's nothing better to learn by example, so I'll show you an approach to your problem using a binary classification problem trained by a Feed-Forward neural network, and inspired by this tutorial from pybrain.
First thing is to define a dataset. The easiest way to visualize is to use a binary dataset on a 2D plane, with points generated from normal distributions, each of them belonging to one of the 2 classes. This will be linearly separable in this case.
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from pylab import ion, ioff, figure, draw, contourf, clf, show, hold, plot
from scipy import diag, arange, meshgrid, where
from numpy.random import multivariate_normal
means = [(-1,0),(2,4),(3,1)]
cov = [diag([1,1]), diag([0.5,1.2]), diag([1.5,0.7])]
n_klass = 2
alldata = ClassificationDataSet(2, 1, nb_classes=n_klass)
for n in xrange(400):
for klass in range(n_klass):
input = multivariate_normal(means[klass],cov[klass])
alldata.addSample(input, [klass])
To visualize, it looks something like this:
Now you want to split it into training and test set:
tstdata, trndata = alldata.splitWithProportion(0.25)
trndata._convertToOneOfMany()
tstdata._convertToOneOfMany()
And to create your network:
fnn = buildNetwork( trndata.indim, 5, trndata.outdim, outclass=SoftmaxLayer )
trainer = BackpropTrainer( fnn, dataset=trndata, momentum=0.1, verbose=True, weightdecay=0.01)
ticks = arange(-3.,6.,0.2)
X, Y = meshgrid(ticks, ticks)
# need column vectors in dataset, not arrays
griddata = ClassificationDataSet(2,1, nb_classes=n_klass)
for i in xrange(X.size):
griddata.addSample([X.ravel()[i],Y.ravel()[i]], [0])
griddata._convertToOneOfMany() # this is still needed to make the fnn feel comfy
Now you need to train your network and see what results you get in the end:
for i in range(20):
trainer.trainEpochs( 1 )
trnresult = percentError( trainer.testOnClassData(),
trndata['class'] )
tstresult = percentError( trainer.testOnClassData(
dataset=tstdata ), tstdata['class'] )
print "epoch: %4d" % trainer.totalepochs, \
" train error: %5.2f%%" % trnresult, \
" test error: %5.2f%%" % tstresult
out = fnn.activateOnDataset(griddata)
out = out.argmax(axis=1) # the highest output activation gives the class
out = out.reshape(X.shape)
figure(1)
ioff() # interactive graphics off
clf() # clear the plot
hold(True) # overplot on
for c in range(n_klass):
here, _ = where(tstdata['class']==c)
plot(tstdata['input'][here,0],tstdata['input'][here,1],'o')
if out.max()!=out.min(): # safety check against flat field
contourf(X, Y, out) # plot the contour
ion() # interactive graphics on
draw() # update the plot
Which gives you a very bad boundary at the beginning:
But in the end a pretty good result:
As for ROC curves, here is a nice and simple Python library to do it on a random toy problem:
from pyroc import *
random_sample = random_mixture_model() # Generate a custom set randomly
#Example instance labels (first index) with the decision function , score (second index)
#-- positive class should be +1 and negative 0.
roc = ROCData(random_sample) #Create the ROC Object
roc.auc() #get the area under the curve
roc.plot(title='ROC Curve') #Create a plot of the ROC curve
Which gives you a single ROC curve:
Of course you can also plot multiple ROC curves on the same graph:
x = random_mixture_model()
r1 = ROCData(x)
y = random_mixture_model()
r2 = ROCData(y)
lista = [r1,r2]
plot_multiple_roc(lista,'Multiple ROC Curves',include_baseline=True)
(remember that the diagonal just means that your classifier is random and that you're probably doing something wrong)
You can probably easily use your modules in any of your classification tasks (not limited to neural networks) and it will produce ROC curves for you.
Now to get the class/probability needed to plot your ROC curve from your neural network, you just need to look at the activation of your neural network: activateOnDataset
in pybrain will give you the probability for both classes (in my example above we just take the max of probabilities to determine which class to consider). From there, just transform it to the format expected by PyROC like for random_mixture_model
and it should give you your ROC curve.
Sure. First, check out this
https://stackoverflow.com/questions/2276933/good-open-source-neural-network-python-library
This is my general idea, I'm sketching out how I might approach this, none of this is tested
From http://pybrain.org/docs/tutorial/netmodcon.html#feed-forward-networks
>>> from pybrain.structure import FeedForwardNetwork
>>> n = FeedForwardNetwork()
>>> n.activate((2, 2))
array([-0.1959887])
We build a neural net, train it (not shown) and get the output. You have a test set, right? You use the test set to generate the data for the ROC curve. For a single output neural net, you want to create a threshold for the output values to translate them to yes or no responses that get the best degree of specificity/sensitivity for your task
This is a good tutorial http://webhome.cs.uvic.ca/~mgbarsky/DM_LABS/LAB_5/Lab5_ROC_weka.pdf
Then you just plot them. Or you can try to find a library that does it for you
I saw this http://pypi.python.org/pypi/yard
The point is, that generating at ROC curve is not specific to neural nets, so you may not find a library that does it for you. I've provided the above to show it's fairly simple to roll your own
* More detail *
Your neural network is going to have an output that you will have to translate in to a classification (likely yes/no). To calculate the ROC curve, you're going to take a few thresholds for yes/no (in other words, .75> yes, <.75 no). From this threshold, you translate the output of your neural net into classifications. By comparing those classifications to the true classifications, you get a false positive and true positive rate. You are then plotting the false positive rate and true positive rate when you tweak that threshold.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With