Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation clustering in R

I'd like to use correlation clustering and I figure R is a good place to start.

I can present the data to R as a set of large, sparse vectors or as a table with a pre-computed dissimilarity matrix.

My questions are:

  • are there existing R functions to turn this into a hierarchical cluster with agnes that uses correlation clustering?
  • will I have to implement the (admittedly simple) correlation clusteringfunction by hand, if so how do I make it play well with agnes?
like image 370
daveb Avatar asked Sep 23 '09 23:09

daveb


People also ask

Is correlation good for clustering?

Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance.

How do you do a correlation analysis in R?

Use the function cor. test(x,y) to analyze the correlation coefficient between two variables and to get significance level of the correlation.

What is intra cluster correlation coefficient?

The intracluster correlation coefficient (ICC) ,or ρ (the Greek rho), is a measure of the relatedness of clustered data. It accounts for the relatedness of clustered data by comparing the variance within clusters with the variance between clusters.


2 Answers

I admittedly know very little about this subject, but just to point you in a direction:

  • Have you looked at the cluster package? It has very good documentation. In particular, look at help(agnes) for some suggestions. Martin Maechler (a member of the R core team) created the package and has contributed to Stack Overflow discussions before, so hopefully he'll provide an answer here.
  • The hclust() function is part of the stats package. In fact, I believe that there are plans to merge hclust() and agnes().
  • You may also find this page from the Bioconductor project helpful.
  • Otherwise, you may have some luck looking at other packages on the CRAN Clustering, Natural Language Processing or Machine Learning views.
like image 148
Shane Avatar answered Oct 24 '22 03:10

Shane


The standard approach would be one that involves cor(), hclust() and plot.hclust(). I'd highly recommend heatmap.2 from the wonderful gplots package.

like image 24
drmjc Avatar answered Oct 24 '22 03:10

drmjc