Let's assume we have the following data frame
data <- data.frame(time=1:10, y1=runif(10), y2=runif(10), y3=runif(10))
and we want to create a plot like this:
p <- ggplot(data, aes(x=time))
p <- p + geom_line(aes(y=y1, colour="y1"))
p <- p + geom_line(aes(y=y2, colour="y2"))
p <- p + geom_line(aes(y=y3, colour="y3"))
plot(p)
But what if we have much more "y" columns, and we do not know their exact name. This raises the question: How can we iterate over all columns programmatically, and add them to the plot? Basically the goal is:
otherFeatures <- names(data)[-1]
for (f in otherFeatures) {
# what goes here?
}
So far I have found many ways that do not work. For instance (all following examples only show the code line in the above for loop):
My first try was simply to use aes_string
instead of aes
in order to specify the column name by the loop variable f
:
p <- p + geom_line(aes_string(y=f, colour=f))
But this does not give the same result, because now colour
will not be a fixed color for each line (aes_string
will interpret f
in the data frame environment). As a result, the legend will become a color bar, and does not contain the different column names. My next guess was to mix aes
and aes_string
, trying to set colour
to a fixed string:
p <- p + geom_line(aes_string(y=f), aes(colour=f))
But this results in Error: ggplot2 doesn't know how to deal with data of class uneval
. My next attempt was to use colour
"absolutely" (not within aes
) like this:
p <- p + geom_line(aes_string(y=f), colour=f)
But this gives Error: invalid color name 'y1'
(and I don't want to pick some proper color names manually either). The next try was to go back to aes
only, replicating the manual approach:
p <- p + geom_line(aes(y=data[[f]], colour=f))
This does not give an error, but will only plot the last column. This makes sense, since aes
will probably call substitute
, and the expression will always be evaluated with the last value of f
in the loop (rm f
before calling plot(p)
gives an error, indicating that the evaluation happens after the loop).
To rephrase the question: What kind of substitute
/eval
/quote
magic is necessary to replicate the simple code from above within a for loop?
This is old now but in case anyone else comes across it, I had a very similar problem that was driving me crazy. The solution I found was to pass aes_q()
to geom_line()
using the as.name()
option. You can find details on aes_q()
here. Below is the way I would solve this problem, though the same principle should work in a loop. Note that I add multiple variables with geom_line()
as a list here, which generalizes better (including to one variable).
varnames <- c("y1", "y2", "y3")
add_lines <- lapply(varnames, function(i) geom_line(aes_q(y = as.name(i), colour = i)))
p <- ggplot(data, aes(x = time))
p <- p + add_lines
plot(p)
Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With