Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

plotting two vectors of data on a GGPLOT2 scatter plot using R

I've been experimenting with both ggplot2 and lattice to graph panels of data. I'm having a little trouble wrapping my mind around the ggplot2 model. In particular, how do I plot a scatter plot with two sets of data on each panel:

in lattice I could do this:

xyplot(Predicted_value + Actual_value ~ x_value | State_CD, data=dd)

and that would give me a panel for each State_CD with each column

I can do one column with ggplot2:

pg <- ggplot(dd, aes(x_value, Predicted_value)) + geom_point(shape = 2) 
      + facet_wrap(~ State_CD) + opts(aspect.ratio = 1)
print(pg)

What I can't grok is how to add Actual_value to the ggplot above.

EDIT Hadley pointed out that this really would be easier with a reproducible example. Here's code that seems to work. Is there a better or more concise way to do this with ggplot? Why is the syntax for adding another set of points to ggplot so different from adding the first set of data?

library(lattice)
library(ggplot2)

#make some example data
dd<-data.frame(matrix(rnorm(108),36,3),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("Predicted_value", "Actual_value", "x_value", "State_CD")

#plot with lattice
xyplot(Predicted_value + Actual_value ~ x_value | State_CD, data=dd)

#plot with ggplot
pg <- ggplot(dd, aes(x_value, Predicted_value)) + geom_point(shape = 2) + facet_wrap(~ State_CD) + opts(aspect.ratio = 1)
print(pg)

pg + geom_point(data=dd,aes(x_value, Actual_value,group=State_CD), colour="green")

The lattice output looks like this: alt text
(source: cerebralmastication.com)

and ggplot looks like this: alt text
(source: cerebralmastication.com)

like image 1000
JD Long Avatar asked Aug 21 '09 19:08

JD Long


People also ask

How do I plot two vectors in R?

To draw a plot from two vectors in R, where one vector represents data points along X axis, and the other vector represents data points along Y axis, call plot() function and pass the two vectors as arguments to the plot() function.

Can you use two datasets in Ggplot?

This section shows how to use the ggplot2 package to draw a plot based on two different data sets. For this, we have to set the data argument within the ggplot function to NULL. Then, we are specifying two geoms (i.e. geom_point and geom_line) and define the data set we want to use within each of those geoms.

Which two methods can be used to create scatter plots in R?

You can create a scatter plot in R with multiple variables, known as pairwise scatter plot or scatterplot matrix, with the pairs function. In addition, in case your dataset contains a factor variable, you can specify the variable in the col argument as follows to plot the groups with different color.


2 Answers

Update: several years on now, I almost always use Jonathan's method (via the tidyr package) with ggplot2. My answer below works in a pinch, but gets tedious fast when you have 3+ variables.


I'm sure Hadley will have a better answer, but - the syntax is different because the ggplot(dd,aes()) syntax is (I think) primarily intended for plotting just one variable. For two, I would use:

ggplot() + 
geom_point(data=dd, aes(x_value, Actual_value, group=State_CD), colour="green") + 
geom_point(data=dd, aes(x_value, Predicted_value, group=State_CD), shape = 2) + 
facet_wrap(~ State_CD) + 
theme(aspect.ratio = 1)

Pulling the first set of points out of the ggplot() gives it the same syntax as the second. I find this easier to deal with because the syntax is the same and it emphasizes the "Grammar of Graphics" that is at the core of ggplot2.

like image 24
Matt Parker Avatar answered Oct 26 '22 12:10

Matt Parker


Just following up on what Ian suggested: for ggplot2 you really want all the y-axis stuff in one column with another column as a factor indicating how you want to decorate it. It is easy to do this with melt. To wit:

qplot(x_value, value, 
      data = melt(dd, measure.vars=c("Predicted_value", "Actual_value")), 
      colour=variable) + facet_wrap(~State_CD)

Here's what it looks like for me: alt text
(source: princeton.edu)

To get an idea of what melt is actually doing, here's the head:

> head(melt(dd, measure.vars=c("Predicted_value", "Actual_value")))
     x_value State_CD        variable      value
1  1.2898779        A Predicted_value  1.0913712
2  0.1077710        A Predicted_value -2.2337188
3 -0.9430190        A Predicted_value  1.1409515
4  0.3698614        A Predicted_value -1.8260033
5 -0.3949606        A Predicted_value -0.3102753
6 -0.1275037        A Predicted_value -1.2945864

You see, it "melts" Predicted_value and Actual_value into one column called value and adds another column called variable letting you know what column it originally came from.

like image 137
Jonathan Chang Avatar answered Oct 26 '22 13:10

Jonathan Chang