How to cluster points and plot

Tags:

I am trying to use clustering in R. I am a rookie and havent worked much with R.

I have the geo location points as latitude and longitude values. What I am looking to do is to find out the hotspots using this data.

I am looking to create clusters of 4 or more points that are 600 feet apart.

I want to get the centroids of such clusters and plot them.

The data looks like this:

LATITUDE    LONGITUD
32.70132    -85.52518
34.74251    -86.88351
32.55205    -87.34777
32.64144    -85.35430
34.92803    -87.81506
32.38016    -86.29790
32.42127    -87.08690
...

structure(list(LATITUDE = c(32.70132, 34.74251, 32.55205, 32.64144, 
34.92803, 32.38016, 32.42127, 32.9095, 33.58092, 32.51617, 33.5726, 
33.83251, 34.65639, 34.27694, 33.73851, 33.95132, 31.35445, 34.05263, 
33.37959, 30.50248, 32.31561, 32.66919, 31.75039, 33.56986, 33.27091, 
33.93598, 32.30964, 31.09773, 32.26711, 33.54263, 34.72014, 34.78548, 
30.65705, 31.25939, 31.27647, 30.54322, 31.22416, 33.38549, 33.18338, 
31.16811, 32.38368, 32.36253, 31.14464), LONGITUD = c(-85.52518, 
-86.88351, -87.34777, -85.3543, -87.81506, -86.2979, -87.0869, 
-85.75888, -86.27647, -86.21179, -86.65275, -87.2696, -85.72738, 
-87.71489, -86.48934, -86.29693, -88.22943, -87.55328, -85.31454, 
-87.79342, -86.88108, -86.26669, -88.04425, -86.44631, -87.74383, 
-87.72403, -86.28067, -85.4449, -87.62541, -86.56251, -86.48971, 
-85.59656, -88.24491, -86.60828, -86.18112, -88.22778, -85.63784, 
-86.03297, -87.55456, -85.37719, -86.38047, -86.21579, -86.86606
)), .Names = c("LATITUDE", "LONGITUD"), class = "data.frame", row.names = c(NA, 
-43L))

There are 30,800 entries (geo locations) in the above data frame. I have given a sample above.

I cannot use K means as it creates the no. of clusters specified but that is not the case here. Clusters should consist of 4 or more points that are within a distance of some 600ft.

Just as an initial step, I tried to plot all the latitude and longitude points and have an idea how the visualization looks like. So that I can use it to check if the plot of clusters formed and this plot look alike.

plot(dbfvar[,1], dbfvar[,2], type="l") #dbfvar is the dataframe having above data.

The plot was not satisfactory. It was not as expected. enter image description here

The main part is to create the clusters and obtain the centroids of them, and visualize the centroids of the clusters formed.

P.S. : I am not confined to using R, I can use python as well. I am looking for a good solution for the above problem before I go ahead and implement it over 7 such files (each of 30,800 geo locations.)

699

asked Oct 24 '14 03:10

user3543477

1 Answers

Hierarchical clustering is one approach.

First you construct a dendrogram:

dend <- hclust(dist(theData), method="complete")

I am using "complete" linkage here, so that all that the groups are merged by the maximum-distance "rule". This should be useful later if we want to make sure that all of our points in one group are at most a certain distance apart.

I choose the distance of "2" (Because I am not sure how to convert your latitudes and longitudes to feet. You should convert first and then choose 600 instead of 2). Here is the resulting dendrogram with the cutting at height of "2".

plot(dend, hang=-1)
points(c(-100,100), c(2,2), col="red", type="l", lty=2)

dendrogram

Now each subtree intersected by the red line will become one cluster.

groups <- cutree(theData, h=2) # change "h" here to 600 after converting to feet.

We can plot them as a scatter plot to see how they look:

plot(theData, col=groups)

cluster_scatter

Promising. The points nearby form clusters which is what we wanted.

Let's add centers and circles around those centers with the radius of 1 (so that the max distance within the circle is 2):

G1 <- tapply(theData[,1], groups, mean)  # means of groups
G2 <- tapply(theData[,2], groups, mean)  # ...

library(plotrix)  # for drawing circles
plot(theData, col=groups)
points(G1, G2, col= 1:6, cex=2, pch=19)
for(i in 1:length(G1)) {  # draw circles
    draw.circle(G1[i], G2[i], 1, border=i,lty=3,lwd=3)
}

radius

Looks like drawing circles around the mean is not the best way to capture all of the points within the cluster. Nevertheless visually it can be verified that maximum distance between the points in one groups is 2. (just try shifting circles a bit to encapsulate all of the points).

answered Oct 08 '22 11:10

Karolis Koncevičius

Related questions
                            
                                How to create a data frame with numeric and character columns?
                            
                                How to put multiple graphs in one plot with ggvis
                            
                                How to subset the most recent 12 months of data for each ID in a data frame?
                            
                                How do I write mathematical equation in R with constants coming from variables?
                            
                                data.table drop key rows and summarize
                            
                                R strsplit before ( and after ) keeping both delimiters
                            
                                R Disparity between browser and GET / getURL
                            
                                Using lapply to fit multiple model -- how to keep the model formula self-contained in lm object
                            
                                Can I load a package's data set without installing the package?
                            
                                Concerning R, when defining a Replacement Function, do the arguments have to be named as/like "x" and "value"?
                            
                                Interpolate multiple NA values with R
                            
                                How to add a random intercept and random slope term to a GAMM model in R
                            
                                Change numeric values in one column based on factor levels in another column
                            
                                How to get the closest element in a vector for every element in another vector without duplicates?
                            
                                How to convert code to more readable form in R
                            
                                how spread() in tidyr handles factor levels
                            
                                R: Fast string split on first delimiter occurence
                            
                                How to conduct PCA on each group for a dataset with multiple groups?
                            
                                How do I put arena limits on a random walk?
                            
                                Cannot access column of data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to cluster points and plot

Tags:

plot

r

cluster-analysis

user3543477

People also ask

1 Answers

Karolis Koncevičius

Recent Activity

Donate For Us