Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Colouring ggplot's plotmatrix by k-means clusters?

Tags:

r

ggplot2

k-means

I am trying to create a pairs plot of 6 data variables using ggplot2 and colour the points according to the k-means cluster they belong to. I read the documentation of the highly impressive 'GGally' package as well as an informal fix by Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]. Unfortunately, I could not find any way to get the desired output in either.

Here is a sample code:-

#The Swiss fertility dataset has been used here

data_ <- read.csv("/home/tejaskale/Ubuntu\ One/IUCAA/Datasets/swiss.csv", header=TRUE)
data_ <- na.omit(data_)

u <- c(2, 3, 4, 5, 6, 7)
x <- data_[,u]
k <- 3
maxIterations <- 100
noOfStarts <- 100
filename <- 'swiss.csv'

library(ggplot2)
library(gridExtra)
library(GGally)

kmeansOutput <- kmeans(x, k, maxIterations, noOfStarts)

xNew <- cbind(x[,1:6], as.factor(kmeansOutput$cluster))
names(xNew)[7] <- 'cluster'
kmeansPlot <- ggpairs(xNew[,1:6], color=xNew$cluster)

OR

kmeansPlot <- plotmatrix(xNew[,1:6], mapping=aes(colour=xNew$cluster))

Both plots are created but aren't coloured according to clusters.

Hope I haven't missed an answer to this question on the forum and apologize if that is indeed the case. Any help would be highly appreciated.

Thanks!

like image 414
tejas_kale Avatar asked Oct 07 '22 05:10

tejas_kale


1 Answers

The following slight modification of plotmatrix2 works fine for me:

plotmatrix2 <- function (data, mapping = aes())
{
    grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
    grid <- subset(grid, x != y)
    all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
        xcol <- grid[i, "x"]
        ycol <- grid[i, "y"]
        data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol], 
            x = data[, xcol], y = data[, ycol], data)
    }))
    all$xvar <- factor(all$xvar, levels = names(data))
    all$yvar <- factor(all$yvar, levels = names(data))
    densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
        data.frame(xvar = names(data)[i], yvar = names(data)[i], 
            x = data[, i])
    }))
    densities$xvar <- factor(densities$xvar, levels = names(data))
    densities$yvar <- factor(densities$yvar, levels = names(data))
    mapping <- defaults(mapping, aes_string(x = "x", y = "y"))
    class(mapping) <- "uneval"
    ggplot(all) + facet_grid(xvar ~ yvar, scales = "free") + 
        geom_point(mapping, na.rm = TRUE) + stat_density(aes(x = x, 
        y = ..scaled.. * diff(range(x)) + min(x)), data = densities, 
        position = "identity", colour = "grey20", geom = "line")
}


plotmatrix2(mtcars[,1:3],aes(colour = factor(cyl)))

enter image description here

It may be a ggplot2 version issue, but I had to force the faceting variables in the densities data frame to be factors (that seems broken to me even in the GGally version). Also, generally don't pass vectors to aes(), but simply column names.

like image 191
joran Avatar answered Oct 12 '22 20:10

joran