I have a melted data frame with your standard id, variable and value columns. variable has 4 levels.
I want to use ggplot to plot a scatter plot using the values in value from each of the factors
to illustrate
data.frame(id= gl(4,1,labels=paste("id",1:4,sep="")), variable=gl(4,4,labels=LETTERS[1:4]),value=rnorm(16))
id variable value
1 id1 A -0.494270766
2 id2 A 0.189400188
3 id3 A -0.550961030
4 id4 A -1.046945450
5 id1 B -0.525552660
6 id2 B -0.293601677
7 id3 B 0.009664513
8 id4 B -0.214687215
9 id1 C 1.253551926
10 id2 C -1.241847326
11 id3 C -0.307036508
12 id4 C -0.228632605
13 id1 D -1.683798512
14 id2 D -0.419295267
15 id3 D -0.154469178
16 id4 D -0.763460558
I want to produce ggplot scatter plots for each pair of variable A vs B, A vs C, A vs D, B vs C, and so on, and then ass smoothers to them afterwards.
Cheers, Davy
Here's a slightly modified version of plotmatrix
in ggplot2 that does this:
dat <- data.frame(id= gl(4,1,labels=paste("id",1:4,sep="")), variable=gl(4,4,labels=LETTERS[1:4]),value=rnorm(16))
require(reshape2)
dat <- dcast(dat,id~variable)
plotmatrix <- function (data, mapping = aes(), colour = "black")
{
grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
grid <- subset(grid, x != y)
all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
xcol <- grid[i, "x"]
ycol <- grid[i, "y"]
data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol],
x = data[, xcol], y = data[, ycol], data)
}))
all$xvar <- factor(all$xvar, levels = names(data))
all$yvar <- factor(all$yvar, levels = names(data))
densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
data.frame(xvar = names(data)[i], yvar = names(data)[i],
x = data[, i])
}))
densities$xvar <- factor(densities$xvar, levels = names(data))
densities$yvar <- factor(densities$yvar, levels = names(data))
mapping <- defaults(mapping, aes_string(x = "x", y = "y"))
class(mapping) <- "uneval"
ggplot(all, mapping) +
facet_grid(xvar ~ yvar, scales = "free") +
geom_point(colour = colour, na.rm = TRUE) +
stat_density(aes(x = x,y = ..scaled.. * diff(range(x)) + min(x)),
data = densities,position = "identity", colour = "grey20", geom = "line") +
geom_smooth(se = FALSE,method = "lm",colour = "blue")
}
plotmatrix(dat[,-1])
Following @Dason's suggestion to try the GGally
package and using @baptise's reshaping code...
library(ggplot2)
library(reshape2)
library(plyr)
library(GGally)
#
n <- 100 # number of observations
i <- 4 # number of variables, cannot exceed 26 since letters are used as labels
#
# create data, following @Davy
d <- data.frame(id= gl(n, 1, labels, paste("id", 1:n,sep="")),
variable=gl(i, n, labels=LETTERS[1:i]),value=rnorm(n*i))
#
# reshape for plotting, from @baptise
group <- unique(d$variable)
m <- dcast(d, ...~variable, subset=.(variable %in% group))
#
# make scatterplot matrix using GGally package
# as suggested by @Dason
ggpairs(m[,2:ncol(m)],
lower = list(continuous = "smooth"),
axisLabels="show")
# done!
The result is a bit busy with grid lines in the boxes above the diagonal (but no doubt they can turned off) and some other finishing touches are needed before this could go prime-time.
But it's generally true to the ggplot2
approach (the smoother can be removed, if required). The GGally
code is available on github.
It's also worth noting that there are examples (including code) of a fantastic variety of scatterplot matrices that can be done in R
at Romain François' R Graph Gallery. This one is quite similar to the one above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With