Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spatial clustering in R (simple example)

I have this simple data.frame

 lat<-c(1,2,3,10,11,12,20,21,22,23)
 lon<-c(5,6,7,30,31,32,50,51,52,53)
 data=data.frame(lat,lon)

The idea is to find the spatial clusters based on the distance

First, I plot the map (lon,lat) :

plot(data$lon,data$lat)

enter image description here

so clearly I have three clusters based in the distance between the position of points.

For this aim, I've tried this code in R :

d= as.matrix(dist(cbind(data$lon,data$lat))) #Creat distance matrix
d=ifelse(d<5,d,0) #keep only distance < 5
d=as.dist(d)
hc<-hclust(d) # hierarchical clustering
plot(hc)
data$clust <- cutree(hc,k=3) # cut the dendrogram to generate 3 clusters

This gives :

enter image description here

Now I try to plot the same points but with colors from clusters

plot(data$x,data$y, col=c("red","blue","green")[data$clust],pch=19)

Here the results

enter image description here

Which is not what I'm looking for.

Actually, I want to find something like this plot

enter image description here

Thank you for help.

like image 545
Math Avatar asked Feb 23 '15 11:02

Math


People also ask

What is a real life example of clustering?

Example 1: Retail Marketing Retail companies often use clustering to identify groups of households that are similar to each other. For example, a retail company may collect the following information on households: Household income.

What is spatial clustering?

Spatial clustering aims to partition spatial data into a series of meaningful subclasses, called spatial clusters, such that spatial objects in the same cluster are similar to each other, and are dissimilar to those in different clusters.

What is clustering give example?

In machine learning too, we often group examples as a first step to understand a subject (data set) in a machine learning system. Grouping unlabeled examples is called clustering. As the examples are unlabeled, clustering relies on unsupervised machine learning.

What is spatial clustering analysis?

Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians.


2 Answers

What about something like this:

lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)

km <- kmeans(cbind(lat, lon), centers = 3)
plot(lon, lat, col = km$cluster, pch = 20)

enter image description here

like image 166
johannes Avatar answered Sep 27 '22 19:09

johannes


Here's a different approach. First it assumes that the coordinates are WGS-84 and not UTM (flat). Then it clusters all neighbors within a given radius to the same cluster using hierarchical clustering (with method = single, which adopts a 'friends of friends' clustering strategy).

In order to compute the distance matrix, I'm using the rdist.earth method from the package fields. The default earth radius for this package is 6378.388 (the equatorial radius) which might not be what one is looking for, so I've changed it to 6371. See this article for more info.

library(fields)
lon = c(31.621785, 31.641773, 31.617269, 31.583895, 31.603284)
lat = c(30.901118, 31.245008, 31.163886, 30.25058, 30.262378)
threshold.in.km <- 40
coors <- data.frame(lon,lat)

#distance matrix
dist.in.km.matrix <- rdist.earth(coors,miles = F,R=6371)

#clustering
fit <- hclust(as.dist(dist.in.km.matrix), method = "single")
clusters <- cutree(fit,h = threshold.in.km)

plot(lon, lat, col = clusters, pch = 20)

This could be a good solution if you don't know the number of clusters (like the k-means option), and is somewhat related to the dbscan option with minPts = 1.

---EDIT---

With the original data:

lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
data=data.frame(lat,lon)

dist <- rdist.earth(data,miles = F,R=6371) #dist <- dist(data) if data is UTM
fit <- hclust(as.dist(dist), method = "single")
clusters <- cutree(fit,h = 1000) #h = 2 if data is UTM
plot(lon, lat, col = clusters, pch = 20)
like image 26
Omri374 Avatar answered Sep 27 '22 19:09

Omri374