I am comparing two rasters with a simple scatter plot of cell-by-cell plot, and find that I have two seemingly different populations:
Now I am trying to extract the locations of each of these populations (by isolating the row IDs, e.g.) so I can see where they fall in the rasters and maybe understand why I get this behavior. Here is a reproducible example:
X <- seq(1,1000,1)
Z <- runif(1000, 1, 2)
A = c(1.2 * X * Z + 100)
B = c(0.6 * X * Z )
df = data.frame(X = c(X,X), Y = c(A,B))
plot(df$X,df$Y)

Also, my original data has some 1,000,000 rows, so the solution needs to support a large data frame as well.
Any ideas on how I can isolate each of these groups?
Thanks
Spectral Clustering is useful in identifying clusters of points that has a clear boundary. A great advantage is that it is unsupervised, i.e. not relying much on human judgement, although the method is slow and some hyperparameters (e.g. number of clusters) need to be supplied.
Below is the code for clustering. The code takes about a few minutes in your case.
library(kernlab)
specc_df <- specc(as.matrix(df),centers = 2)
plot(df, col = specc_df)
The result is an obvious plot of two clusters of points.

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With