Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting content from multiple data frames into a single ggplot2 surface

Tags:

r

ggplot2

I am a total R beginner here, with corresponding level of sophistication of this question.

I am using the ROCR package in R to generate plotting data for ROC curves. I then use ggplot2 to draw the plot. Something like this:

library(ggplot2)
library(ROCR)

inputFile <- read.csv("path/to/file", header=FALSE, sep=" ", colClasses=c('numeric','numeric'), col.names=c('score','label'))

predictions <- prediction(inputFile$score, inputFile$label)
auc <- performance(predictions, measure="auc")@y.values[[1]]

rocData <- performance(predictions, "tpr","fpr")
rocDataFrame <- data.frame([email protected][[1]],[email protected][[1]])

rocr.plot <- ggplot(data=rd, aes(x=x, y=y)) + geom_path(size=1)
rocr.plot <- rocr.plot + geom_text(aes(x=1, y= 0, hjust=1, vjust=0, label=paste(sep = "", "AUC = ",round(auc,4))),colour="black",size=4)

This works well for drawing a single ROC curve. However, what I would like to do is read in a whole directory worth of input files - one file per classifier test results - and make a ggplot2 multifaceted plot of all the ROC curves, while still printing the AUC score into each plot.

I would like to understand what is the "proper" R-style approach to accomplishing this. I am sure I can hack something together by having one loop go through all files in the directory and create a separate data frame for each, and then having another loop to create multiple plots, and somehow getting ggplo2 to output all these plots onto the same surface. However, that does not let me use ggplot2's built-in faceting, which I believe is the right approach. I am not sure how to get my data into proper shape for faceting use, though. Should I be merging all my data frames into a single one, and giving each merged chunk a name (e.g. filename) and faceting on that? If so, is there a library or recommended practice for making this happen?

Your suggestions are appreciated. I am still wrapping my head around the best practices in R, so I'd rather get expert advice instead of just hacking things up to make code that looks more like ordinary declarative programming languages that I am used to.

EDIT: The thing I am least clear on is whether, when using ggplot2's built-in faceting capabilities, I'd still be able to output a custom string (AUC score) into each plot it will generate.

like image 711
Inverseofverse Avatar asked Aug 08 '12 07:08

Inverseofverse


1 Answers

Here is an example of how to generate a plot as you described. I use the built-in dataset quakes:

The code does the following:

  • Load the ggplot2 and plyr packages
  • Add a facet variable to quakes - in this case I summarise by depth of earthquake
  • Use ddply to summarise the mean magnitude for each depth
  • Use ggplot with geom_text to label the mean magnitude

The code:

library(plyr)
library(ggplot2)

quakes$level <- cut(quakes$depth, 5, 
  labels=c("Very Shallow", "Shallow", "Medium", "Deep", "Very Deep"))

quakes.summary <- ddply(quakes, .(level), summarise, mag=round(mean(mag), 1))

ggplot(quakes, aes(x=long, y=lat)) + 
  geom_point(aes(colour=mag)) + 
  geom_text(aes(label=mag), data=quakes.summary, x=185, y=-35) +
  facet_grid(~level) + 
  coord_map()

enter image description here

like image 104
Andrie Avatar answered Oct 05 '22 23:10

Andrie