Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R k-means algorithm custom centers

Tags:

r

k-means

I have a 2D dataset imported in R - (x,y) coordinates. I want to perform k-means clustering on this dataset, but I would like to set concrete coordinates to be initial centers. For example, I would like to start with 5 centers with their values to be (5, 10), (3, 8), (46, 22), (87, 66), (39, 41) .

I saw a centers parameter in kmeans function, but I do not understand how to set my values to be centeres.

kmeans(data, centers = ...) # what to set here?
like image 271
Bob Avatar asked Apr 13 '15 12:04

Bob


2 Answers

The centers parameter takes either an integer k, in which case k random points from data are chosen as initial centers, or a matrix of initial centers, with as many columns as data. Try this:

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
                 matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
kmeans(x,centers=3)
kmeans(x,centers=x[1:3,])
like image 84
Stephan Kolassa Avatar answered Oct 17 '22 00:10

Stephan Kolassa


Just pass a matrix, here is a quick example:

data = matrix(c(1.1,1,0.97,0.99,0.95,0.8,0.91,2.1,2,2.4,4.1,4.4,4.5,3.9,1.5,1.2,1.7,2.6,2.7,2.44), ncol=2)

Now let's mention 2 starting points C1 (x1=1,y1=3) and C2 (x2=2,y2=4) (even if there are obviously 3 groups):

km = kmeans(data, centers=matrix(c(1,2,3,4),ncol=2))

Some plotting after applying the algo: df = transform(as.data.frame(data), group=as.character(km$cluster)) library(ggplot2)

ggplot(df, aes(V1, V2, color=group)) + geom_point()

enter image description here

like image 39
Colonel Beauvel Avatar answered Oct 17 '22 00:10

Colonel Beauvel