I have this simple data.frame
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
data=data.frame(lat,lon)
The idea is to find the spatial clusters based on the distance
First, I plot the map (lon,lat) :
plot(data$lon,data$lat)
so clearly I have three clusters based in the distance between the position of points.
For this aim, I've tried this code in R :
d= as.matrix(dist(cbind(data$lon,data$lat))) #Creat distance matrix
d=ifelse(d<5,d,0) #keep only distance < 5
d=as.dist(d)
hc<-hclust(d) # hierarchical clustering
plot(hc)
data$clust <- cutree(hc,k=3) # cut the dendrogram to generate 3 clusters
This gives :
Now I try to plot the same points but with colors from clusters
plot(data$x,data$y, col=c("red","blue","green")[data$clust],pch=19)
Here the results
Which is not what I'm looking for.
Actually, I want to find something like this plot
Thank you for help.
Example 1: Retail Marketing Retail companies often use clustering to identify groups of households that are similar to each other. For example, a retail company may collect the following information on households: Household income.
Spatial clustering aims to partition spatial data into a series of meaningful subclasses, called spatial clusters, such that spatial objects in the same cluster are similar to each other, and are dissimilar to those in different clusters.
In machine learning too, we often group examples as a first step to understand a subject (data set) in a machine learning system. Grouping unlabeled examples is called clustering. As the examples are unlabeled, clustering relies on unsupervised machine learning.
Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians.
What about something like this:
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
km <- kmeans(cbind(lat, lon), centers = 3)
plot(lon, lat, col = km$cluster, pch = 20)
Here's a different approach. First it assumes that the coordinates are WGS-84 and not UTM (flat). Then it clusters all neighbors within a given radius to the same cluster using hierarchical clustering (with method = single
, which adopts a 'friends of friends' clustering strategy).
In order to compute the distance matrix, I'm using the rdist.earth
method from the package fields
. The default earth radius for this package is 6378.388 (the equatorial radius) which might not be what one is looking for, so I've changed it to 6371. See this article for more info.
library(fields)
lon = c(31.621785, 31.641773, 31.617269, 31.583895, 31.603284)
lat = c(30.901118, 31.245008, 31.163886, 30.25058, 30.262378)
threshold.in.km <- 40
coors <- data.frame(lon,lat)
#distance matrix
dist.in.km.matrix <- rdist.earth(coors,miles = F,R=6371)
#clustering
fit <- hclust(as.dist(dist.in.km.matrix), method = "single")
clusters <- cutree(fit,h = threshold.in.km)
plot(lon, lat, col = clusters, pch = 20)
This could be a good solution if you don't know the number of clusters (like the k-means option), and is somewhat related to the dbscan option with minPts = 1.
---EDIT---
With the original data:
lat<-c(1,2,3,10,11,12,20,21,22,23)
lon<-c(5,6,7,30,31,32,50,51,52,53)
data=data.frame(lat,lon)
dist <- rdist.earth(data,miles = F,R=6371) #dist <- dist(data) if data is UTM
fit <- hclust(as.dist(dist), method = "single")
clusters <- cutree(fit,h = 1000) #h = 2 if data is UTM
plot(lon, lat, col = clusters, pch = 20)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With