Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r identify two populations in scatterplot

I am comparing two rasters with a simple scatter plot of cell-by-cell plot, and find that I have two seemingly different populations:

true scatterplot

Now I am trying to extract the locations of each of these populations (by isolating the row IDs, e.g.) so I can see where they fall in the rasters and maybe understand why I get this behavior. Here is a reproducible example:
X <- seq(1,1000,1)
Z <- runif(1000, 1, 2)
A = c(1.2 * X * Z + 100)
B = c(0.6 * X * Z )
df = data.frame(X = c(X,X), Y = c(A,B))
plot(df$X,df$Y)
sample scatter
Also, my original data has some 1,000,000 rows, so the solution needs to support a large data frame as well. Any ideas on how I can isolate each of these groups?
Thanks

like image 872
Ilik Avatar asked Apr 07 '26 15:04

Ilik


1 Answers

Spectral Clustering is useful in identifying clusters of points that has a clear boundary. A great advantage is that it is unsupervised, i.e. not relying much on human judgement, although the method is slow and some hyperparameters (e.g. number of clusters) need to be supplied.

Below is the code for clustering. The code takes about a few minutes in your case.

library(kernlab)
specc_df <- specc(as.matrix(df),centers = 2)
plot(df, col = specc_df)

The result is an obvious plot of two clusters of points. obviously two groups of points

like image 82
raymkchow Avatar answered Apr 09 '26 06:04

raymkchow



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!