Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a cluster plot in R?

How can I create a cluster plot in R without using clustplot?

I am trying to get to grips with some clustering (using R) and visualisation (using HTML5 Canvas).

Basically, I want to create a cluster plot but instead of plotting the data, I want to get a set of 2D points or coordinates that I can pull into canvas and do something might pretty with (but I am unsure of how to do this). I would imagine that I:

  1. Create a similarity matrix for the entire dataset (using dist)
  2. Cluster the similarity matrix using kmeans or something similar (using kmeans)
  3. Plot the result using MDS or PCA - but I am unsure of how steps 2 and 3 relate (cmdscale).

I've checked out questions here, here and here (with the last one being of most use).

like image 754
slotishtype Avatar asked Jan 26 '12 14:01

slotishtype


People also ask

How do I prepare data for cluster analysis in R?

Data Preparation To perform a cluster analysis in R, generally, the data should be prepared as follows: Rows are observations (individuals) and columns are variables. Any missing value in the data must be removed or estimated. The data must be standardized (i.e., scaled) to make variables comparable.

What is cluster in R programming?

Clustering in R Programming Language is an unsupervised learning technique in which the data set is partitioned into several groups called as clusters based on their similarity. Several clusters of data are produced after the segmentation of data. All the objects in a cluster share common characteristics.


1 Answers

Did you mean something like this? Sorry but i know nothing about HTML5 Canvas, only R... But I hope it helps...

First I cluster the data using kmeans (note that I did not cluster the distance matrix), than I compute the distance matix and plot it using cmdscale. Then I add colors to the MDS-plot that correspond to the groups identified by kmeans. Plus some nice additional graphical features.

You can access the coordinates from the object created by cmdscale.

### some sample data
require(vegan)
data(dune)

# kmeans
kclus <- kmeans(dune,centers= 4, iter.max=1000, nstart=10000)

# distance matrix
dune_dist <- dist(dune)

# Multidimensional scaling
cmd <- cmdscale(dune_dist)

# plot MDS, with colors by groups from kmeans
groups <- levels(factor(kclus$cluster))
ordiplot(cmd, type = "n")
cols <- c("steelblue", "darkred", "darkgreen", "pink")
for(i in seq_along(groups)){
  points(cmd[factor(kclus$cluster) == groups[i], ], col = cols[i], pch = 16)
}

# add spider and hull
ordispider(cmd, factor(kclus$cluster), label = TRUE)
ordihull(cmd, factor(kclus$cluster), lty = "dotted")

enter image description here

like image 192
EDi Avatar answered Oct 24 '22 15:10

EDi