I am a total R beginner here, with corresponding level of sophistication of this question.
I am using the ROCR package in R to generate plotting data for ROC curves. I then use ggplot2 to draw the plot. Something like this:
library(ggplot2)
library(ROCR)
inputFile <- read.csv("path/to/file", header=FALSE, sep=" ", colClasses=c('numeric','numeric'), col.names=c('score','label'))
predictions <- prediction(inputFile$score, inputFile$label)
auc <- performance(predictions, measure="auc")@y.values[[1]]
rocData <- performance(predictions, "tpr","fpr")
rocDataFrame <- data.frame([email protected][[1]],[email protected][[1]])
rocr.plot <- ggplot(data=rd, aes(x=x, y=y)) + geom_path(size=1)
rocr.plot <- rocr.plot + geom_text(aes(x=1, y= 0, hjust=1, vjust=0, label=paste(sep = "", "AUC = ",round(auc,4))),colour="black",size=4)
This works well for drawing a single ROC curve. However, what I would like to do is read in a whole directory worth of input files - one file per classifier test results - and make a ggplot2 multifaceted plot of all the ROC curves, while still printing the AUC score into each plot.
I would like to understand what is the "proper" R-style approach to accomplishing this. I am sure I can hack something together by having one loop go through all files in the directory and create a separate data frame for each, and then having another loop to create multiple plots, and somehow getting ggplo2 to output all these plots onto the same surface. However, that does not let me use ggplot2's built-in faceting, which I believe is the right approach. I am not sure how to get my data into proper shape for faceting use, though. Should I be merging all my data frames into a single one, and giving each merged chunk a name (e.g. filename) and faceting on that? If so, is there a library or recommended practice for making this happen?
Your suggestions are appreciated. I am still wrapping my head around the best practices in R, so I'd rather get expert advice instead of just hacking things up to make code that looks more like ordinary declarative programming languages that I am used to.
EDIT: The thing I am least clear on is whether, when using ggplot2's built-in faceting capabilities, I'd still be able to output a custom string (AUC score) into each plot it will generate.
Here is an example of how to generate a plot as you described. I use the built-in dataset quakes
:
The code does the following:
ggplot2
and plyr
packagesquakes
- in this case I summarise by depth of earthquakeddply
to summarise the mean magnitude for each depthggplot
with geom_text
to label the mean magnitudeThe code:
library(plyr)
library(ggplot2)
quakes$level <- cut(quakes$depth, 5,
labels=c("Very Shallow", "Shallow", "Medium", "Deep", "Very Deep"))
quakes.summary <- ddply(quakes, .(level), summarise, mag=round(mean(mag), 1))
ggplot(quakes, aes(x=long, y=lat)) +
geom_point(aes(colour=mag)) +
geom_text(aes(label=mag), data=quakes.summary, x=185, y=-35) +
facet_grid(~level) +
coord_map()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With