Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot: aes vs aes_string, or how to programmatically specify column names?

Tags:

plot

r

ggplot2

Let's assume we have the following data frame

data <- data.frame(time=1:10, y1=runif(10), y2=runif(10), y3=runif(10))

and we want to create a plot like this:

p <- ggplot(data, aes(x=time))
p <- p + geom_line(aes(y=y1, colour="y1"))
p <- p + geom_line(aes(y=y2, colour="y2"))
p <- p + geom_line(aes(y=y3, colour="y3"))
plot(p)

enter image description here

But what if we have much more "y" columns, and we do not know their exact name. This raises the question: How can we iterate over all columns programmatically, and add them to the plot? Basically the goal is:

otherFeatures <- names(data)[-1]
for (f in otherFeatures) {
  # what goes here?
}

Failed Attempts

So far I have found many ways that do not work. For instance (all following examples only show the code line in the above for loop):

My first try was simply to use aes_string instead of aes in order to specify the column name by the loop variable f:

p <- p + geom_line(aes_string(y=f, colour=f))

But this does not give the same result, because now colour will not be a fixed color for each line (aes_string will interpret f in the data frame environment). As a result, the legend will become a color bar, and does not contain the different column names. My next guess was to mix aes and aes_string, trying to set colour to a fixed string:

p <- p + geom_line(aes_string(y=f), aes(colour=f))

But this results in Error: ggplot2 doesn't know how to deal with data of class uneval. My next attempt was to use colour "absolutely" (not within aes) like this:

p <- p + geom_line(aes_string(y=f), colour=f)

But this gives Error: invalid color name 'y1' (and I don't want to pick some proper color names manually either). The next try was to go back to aes only, replicating the manual approach:

p <- p + geom_line(aes(y=data[[f]], colour=f))

This does not give an error, but will only plot the last column. This makes sense, since aes will probably call substitute, and the expression will always be evaluated with the last value of f in the loop (rm f before calling plot(p) gives an error, indicating that the evaluation happens after the loop).

To rephrase the question: What kind of substitute/eval/quote magic is necessary to replicate the simple code from above within a for loop?

like image 913
bluenote10 Avatar asked Nov 12 '14 20:11

bluenote10


1 Answers

This is old now but in case anyone else comes across it, I had a very similar problem that was driving me crazy. The solution I found was to pass aes_q() to geom_line() using the as.name() option. You can find details on aes_q() here. Below is the way I would solve this problem, though the same principle should work in a loop. Note that I add multiple variables with geom_line() as a list here, which generalizes better (including to one variable).

varnames <- c("y1", "y2", "y3")
add_lines <- lapply(varnames, function(i) geom_line(aes_q(y = as.name(i), colour = i)))

p <- ggplot(data, aes(x = time))
p <- p + add_lines
plot(p)

Hope that helps!

like image 169
Michael Avatar answered Oct 13 '22 12:10

Michael