Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long

I am trying to figure out how isolated certain points are within my data set. I am using two methods to determine isolation, the distance of the closest neighbor and the number of neighboring sites within a given radius. All my coordinates are in latitude and longitude

This is what my data looks like:

    pond            lat         long        area    canopy  avg.depth   neighbor    n.lat   n.long  n.distance  n.area  n.canopy    n.depth n.avg.depth radius1500     A10             41.95928    -72.14605   1500    66      60.61538462                                      AA006           41.96431    -72.121     250     0       57.77777778                                      Blacksmith      41.95508    -72.123803  361     77      71.3125                                      Borrow.Pit.1    41.95601    -72.15419   0       0       41.44444444                                      Borrow.Pit.2    41.95571    -72.15413   0       0       37.7                                         Borrow.Pit.3    41.95546    -72.15375   0       0       29.22222222                                      Boulder         41.918223   -72.14978   1392    98      43.53333333                                  

I want to put the name of the nearest neighboring pond in the column neighbor, its lat and long in n.lat and n.long, the distance between the two ponds in n.distance, and the area, canopy and avg.depth in each of the appropriate columns.

Second, I want to put the number of ponds within 1500m of the target pond into radius1500.

Does anyone know of a function or package that will help me calculate the distances/numbers that I want? If it's an issue, it won't be hard to enter the other data I need, but the nearest neighbor's name and distance, plus the number of ponds within 1500m is what I really need help with.

Thank you.

like image 746
user2934942 Avatar asked Feb 24 '14 02:02

user2934942


2 Answers

Best option is to use libraries sp and rgeos, which enable you to construct spatial classes and perform geoprocessing.

library(sp) library(rgeos) 

Read the data and transform them to spatial objects:

mydata <- read.delim('d:/temp/testfile.txt', header=T)  sp.mydata <- mydata coordinates(sp.mydata) <- ~long+lat  class(sp.mydata) [1] "SpatialPointsDataFrame" attr(,"package") [1] "sp" 

Now calculate pairwise distances between points

d <- gDistance(sp.mydata, byid=T) 

Find second shortest distance (closest distance is of point to itself, therefore use second shortest)

min.d <- apply(d, 1, function(x) order(x, decreasing=F)[2]) 

Construct new data frame with desired variables

newdata <- cbind(mydata, mydata[min.d,], apply(d, 1, function(x) sort(x, decreasing=F)[2]))  colnames(newdata) <- c(colnames(mydata), 'neighbor', 'n.lat', 'n.long', 'n.area', 'n.canopy', 'n.avg.depth', 'distance')  newdata             pond      lat      long area canopy avg.depth     neighbor    n.lat    n.long n.area n.canopy n.avg.depth 6            A10 41.95928 -72.14605 1500     66  60.61538 Borrow.Pit.3 41.95546 -72.15375      0        0    29.22222 3          AA006 41.96431 -72.12100  250      0  57.77778   Blacksmith 41.95508 -72.12380    361       77    71.31250 2     Blacksmith 41.95508 -72.12380  361     77  71.31250        AA006 41.96431 -72.12100    250        0    57.77778 5   Borrow.Pit.1 41.95601 -72.15419    0      0  41.44444 Borrow.Pit.2 41.95571 -72.15413      0        0    37.70000 4   Borrow.Pit.2 41.95571 -72.15413    0      0  37.70000 Borrow.Pit.1 41.95601 -72.15419      0        0    41.44444 5.1 Borrow.Pit.3 41.95546 -72.15375    0      0  29.22222 Borrow.Pit.2 41.95571 -72.15413      0        0    37.70000 6.1      Boulder 41.91822 -72.14978 1392     98  43.53333 Borrow.Pit.3 41.95546 -72.15375      0        0    29.22222         distance 6   0.0085954872 3   0.0096462277 2   0.0096462277 5   0.0003059412 4   0.0003059412 5.1 0.0004548626 6.1 0.0374480316 

Edit: if coordinates are in degrees and you would like to calculate distance in kilometers, use package geosphere

library(geosphere)  d <- distm(sp.mydata)  # rest is the same 

This should provide better results, if the points are scattered across the globe and coordinates are in degrees

like image 115
Zbynek Avatar answered Oct 01 '22 08:10

Zbynek


I add below an alternative solution using the newer sf package, for those interested and coming to this page now (as I did).

First, load the data and create the sf object.

# Using sf mydata <- structure(   list(pond = c("A10", "AA006", "Blacksmith", "Borrow.Pit.1",                  "Borrow.Pit.2", "Borrow.Pit.3", "Boulder"),         lat = c(41.95928, 41.96431, 41.95508, 41.95601, 41.95571, 41.95546,                 41.918223),         long = c(-72.14605, -72.121, -72.123803, -72.15419, -72.15413,                  -72.15375, -72.14978),         area = c(1500L, 250L, 361L, 0L, 0L, 0L, 1392L),         canopy = c(66L, 0L, 77L, 0L, 0L, 0L, 98L),         avg.depth = c(60.61538462, 57.77777778, 71.3125, 41.44444444,                       37.7, 29.22222222, 43.53333333)),    class = "data.frame", row.names = c(NA, -7L))   library(sf) data_sf <- st_as_sf(mydata, coords = c("long", "lat"),                     # Change to your CRS                     crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") st_is_longlat(data_sf) 

sf::st_distance calculates the distance matrix in meters using Great Circle distance when using lat/lon data.

dist.mat <- st_distance(data_sf) # Great Circle distance since in lat/lon # Number within 1.5km: Subtract 1 to exclude the point itself num.1500 <- apply(dist.mat, 1, function(x) {   sum(x < 1500) - 1 })  # Calculate nearest distance nn.dist <- apply(dist.mat, 1, function(x) {   return(sort(x, partial = 2)[2]) }) # Get index for nearest distance nn.index <- apply(dist.mat, 1, function(x) { order(x, decreasing=F)[2] })  n.data <- mydata colnames(n.data)[1] <- "neighbor" colnames(n.data)[2:ncol(n.data)] <-    paste0("n.", colnames(n.data)[2:ncol(n.data)]) mydata2 <- data.frame(mydata,                       n.data[nn.index, ],                       n.distance = nn.dist,                       radius1500 = num.1500) rownames(mydata2) <- seq(nrow(mydata2)) 
mydata2           pond      lat      long area canopy avg.depth     neighbor    n.lat    n.long n.area n.canopy 1          A10 41.95928 -72.14605 1500     66  60.61538 Borrow.Pit.1 41.95601 -72.15419      0        0 2        AA006 41.96431 -72.12100  250      0  57.77778   Blacksmith 41.95508 -72.12380    361       77 3   Blacksmith 41.95508 -72.12380  361     77  71.31250        AA006 41.96431 -72.12100    250        0 4 Borrow.Pit.1 41.95601 -72.15419    0      0  41.44444 Borrow.Pit.2 41.95571 -72.15413      0        0 5 Borrow.Pit.2 41.95571 -72.15413    0      0  37.70000 Borrow.Pit.1 41.95601 -72.15419      0        0 6 Borrow.Pit.3 41.95546 -72.15375    0      0  29.22222 Borrow.Pit.2 41.95571 -72.15413      0        0 7      Boulder 41.91822 -72.14978 1392     98  43.53333 Borrow.Pit.3 41.95546 -72.15375      0        0   n.avg.depth n.distance radius1500 1    41.44444  766.38426          3 2    71.31250 1051.20527          1 3    57.77778 1051.20527          1 4    37.70000   33.69099          3 5    41.44444   33.69099          3 6    37.70000   41.99576          3 7    29.22222 4149.07406          0 

For obtaining the closest neighbor after calculating distance, you can use sort() with the partial = 2 argument. Depending on the amount of data, this could be much faster than using order as in the previous solution. The package Rfast is likely even faster but I avoid including additional packages here. See this related post for a discussion and benchmarking of various solutions: https://stackoverflow.com/a/53144760/12265198

like image 29
bzki Avatar answered Oct 01 '22 06:10

bzki