I am trying to create a pairs plot of 6 data variables using ggplot2 and colour the points according to the k-means cluster they belong to. I read the documentation of the highly impressive 'GGally' package as well as an informal fix by Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]. Unfortunately, I could not find any way to get the desired output in either.
Here is a sample code:-
#The Swiss fertility dataset has been used here
data_ <- read.csv("/home/tejaskale/Ubuntu\ One/IUCAA/Datasets/swiss.csv", header=TRUE)
data_ <- na.omit(data_)
u <- c(2, 3, 4, 5, 6, 7)
x <- data_[,u]
k <- 3
maxIterations <- 100
noOfStarts <- 100
filename <- 'swiss.csv'
library(ggplot2)
library(gridExtra)
library(GGally)
kmeansOutput <- kmeans(x, k, maxIterations, noOfStarts)
xNew <- cbind(x[,1:6], as.factor(kmeansOutput$cluster))
names(xNew)[7] <- 'cluster'
kmeansPlot <- ggpairs(xNew[,1:6], color=xNew$cluster)
OR
kmeansPlot <- plotmatrix(xNew[,1:6], mapping=aes(colour=xNew$cluster))
Both plots are created but aren't coloured according to clusters.
Hope I haven't missed an answer to this question on the forum and apologize if that is indeed the case. Any help would be highly appreciated.
Thanks!
The following slight modification of plotmatrix2
works fine for me:
plotmatrix2 <- function (data, mapping = aes())
{
grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
grid <- subset(grid, x != y)
all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
xcol <- grid[i, "x"]
ycol <- grid[i, "y"]
data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol],
x = data[, xcol], y = data[, ycol], data)
}))
all$xvar <- factor(all$xvar, levels = names(data))
all$yvar <- factor(all$yvar, levels = names(data))
densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
data.frame(xvar = names(data)[i], yvar = names(data)[i],
x = data[, i])
}))
densities$xvar <- factor(densities$xvar, levels = names(data))
densities$yvar <- factor(densities$yvar, levels = names(data))
mapping <- defaults(mapping, aes_string(x = "x", y = "y"))
class(mapping) <- "uneval"
ggplot(all) + facet_grid(xvar ~ yvar, scales = "free") +
geom_point(mapping, na.rm = TRUE) + stat_density(aes(x = x,
y = ..scaled.. * diff(range(x)) + min(x)), data = densities,
position = "identity", colour = "grey20", geom = "line")
}
plotmatrix2(mtcars[,1:3],aes(colour = factor(cyl)))
It may be a ggplot2 version issue, but I had to force the faceting variables in the densities
data frame to be factors (that seems broken to me even in the GGally version). Also, generally don't pass vectors to aes()
, but simply column names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With