I am trying to figure out how isolated certain points are within my data set. I am using two methods to determine isolation, the distance of the closest neighbor and the number of neighboring sites within a given radius. All my coordinates are in latitude and longitude
This is what my data looks like:
pond lat long area canopy avg.depth neighbor n.lat n.long n.distance n.area n.canopy n.depth n.avg.depth radius1500 A10 41.95928 -72.14605 1500 66 60.61538462 AA006 41.96431 -72.121 250 0 57.77777778 Blacksmith 41.95508 -72.123803 361 77 71.3125 Borrow.Pit.1 41.95601 -72.15419 0 0 41.44444444 Borrow.Pit.2 41.95571 -72.15413 0 0 37.7 Borrow.Pit.3 41.95546 -72.15375 0 0 29.22222222 Boulder 41.918223 -72.14978 1392 98 43.53333333
I want to put the name of the nearest neighboring pond in the column neighbor, its lat and long in n.lat and n.long, the distance between the two ponds in n.distance, and the area, canopy and avg.depth in each of the appropriate columns.
Second, I want to put the number of ponds within 1500m of the target pond into radius1500.
Does anyone know of a function or package that will help me calculate the distances/numbers that I want? If it's an issue, it won't be hard to enter the other data I need, but the nearest neighbor's name and distance, plus the number of ponds within 1500m is what I really need help with.
Thank you.
Best option is to use libraries sp
and rgeos
, which enable you to construct spatial classes and perform geoprocessing.
library(sp) library(rgeos)
Read the data and transform them to spatial objects:
mydata <- read.delim('d:/temp/testfile.txt', header=T) sp.mydata <- mydata coordinates(sp.mydata) <- ~long+lat class(sp.mydata) [1] "SpatialPointsDataFrame" attr(,"package") [1] "sp"
Now calculate pairwise distances between points
d <- gDistance(sp.mydata, byid=T)
Find second shortest distance (closest distance is of point to itself, therefore use second shortest)
min.d <- apply(d, 1, function(x) order(x, decreasing=F)[2])
Construct new data frame with desired variables
newdata <- cbind(mydata, mydata[min.d,], apply(d, 1, function(x) sort(x, decreasing=F)[2])) colnames(newdata) <- c(colnames(mydata), 'neighbor', 'n.lat', 'n.long', 'n.area', 'n.canopy', 'n.avg.depth', 'distance') newdata pond lat long area canopy avg.depth neighbor n.lat n.long n.area n.canopy n.avg.depth 6 A10 41.95928 -72.14605 1500 66 60.61538 Borrow.Pit.3 41.95546 -72.15375 0 0 29.22222 3 AA006 41.96431 -72.12100 250 0 57.77778 Blacksmith 41.95508 -72.12380 361 77 71.31250 2 Blacksmith 41.95508 -72.12380 361 77 71.31250 AA006 41.96431 -72.12100 250 0 57.77778 5 Borrow.Pit.1 41.95601 -72.15419 0 0 41.44444 Borrow.Pit.2 41.95571 -72.15413 0 0 37.70000 4 Borrow.Pit.2 41.95571 -72.15413 0 0 37.70000 Borrow.Pit.1 41.95601 -72.15419 0 0 41.44444 5.1 Borrow.Pit.3 41.95546 -72.15375 0 0 29.22222 Borrow.Pit.2 41.95571 -72.15413 0 0 37.70000 6.1 Boulder 41.91822 -72.14978 1392 98 43.53333 Borrow.Pit.3 41.95546 -72.15375 0 0 29.22222 distance 6 0.0085954872 3 0.0096462277 2 0.0096462277 5 0.0003059412 4 0.0003059412 5.1 0.0004548626 6.1 0.0374480316
Edit: if coordinates are in degrees and you would like to calculate distance in kilometers, use package geosphere
library(geosphere) d <- distm(sp.mydata) # rest is the same
This should provide better results, if the points are scattered across the globe and coordinates are in degrees
I add below an alternative solution using the newer sf
package, for those interested and coming to this page now (as I did).
First, load the data and create the sf
object.
# Using sf mydata <- structure( list(pond = c("A10", "AA006", "Blacksmith", "Borrow.Pit.1", "Borrow.Pit.2", "Borrow.Pit.3", "Boulder"), lat = c(41.95928, 41.96431, 41.95508, 41.95601, 41.95571, 41.95546, 41.918223), long = c(-72.14605, -72.121, -72.123803, -72.15419, -72.15413, -72.15375, -72.14978), area = c(1500L, 250L, 361L, 0L, 0L, 0L, 1392L), canopy = c(66L, 0L, 77L, 0L, 0L, 0L, 98L), avg.depth = c(60.61538462, 57.77777778, 71.3125, 41.44444444, 37.7, 29.22222222, 43.53333333)), class = "data.frame", row.names = c(NA, -7L)) library(sf) data_sf <- st_as_sf(mydata, coords = c("long", "lat"), # Change to your CRS crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") st_is_longlat(data_sf)
sf::st_distance
calculates the distance matrix in meters using Great Circle distance when using lat/lon data.
dist.mat <- st_distance(data_sf) # Great Circle distance since in lat/lon # Number within 1.5km: Subtract 1 to exclude the point itself num.1500 <- apply(dist.mat, 1, function(x) { sum(x < 1500) - 1 }) # Calculate nearest distance nn.dist <- apply(dist.mat, 1, function(x) { return(sort(x, partial = 2)[2]) }) # Get index for nearest distance nn.index <- apply(dist.mat, 1, function(x) { order(x, decreasing=F)[2] }) n.data <- mydata colnames(n.data)[1] <- "neighbor" colnames(n.data)[2:ncol(n.data)] <- paste0("n.", colnames(n.data)[2:ncol(n.data)]) mydata2 <- data.frame(mydata, n.data[nn.index, ], n.distance = nn.dist, radius1500 = num.1500) rownames(mydata2) <- seq(nrow(mydata2))
mydata2 pond lat long area canopy avg.depth neighbor n.lat n.long n.area n.canopy 1 A10 41.95928 -72.14605 1500 66 60.61538 Borrow.Pit.1 41.95601 -72.15419 0 0 2 AA006 41.96431 -72.12100 250 0 57.77778 Blacksmith 41.95508 -72.12380 361 77 3 Blacksmith 41.95508 -72.12380 361 77 71.31250 AA006 41.96431 -72.12100 250 0 4 Borrow.Pit.1 41.95601 -72.15419 0 0 41.44444 Borrow.Pit.2 41.95571 -72.15413 0 0 5 Borrow.Pit.2 41.95571 -72.15413 0 0 37.70000 Borrow.Pit.1 41.95601 -72.15419 0 0 6 Borrow.Pit.3 41.95546 -72.15375 0 0 29.22222 Borrow.Pit.2 41.95571 -72.15413 0 0 7 Boulder 41.91822 -72.14978 1392 98 43.53333 Borrow.Pit.3 41.95546 -72.15375 0 0 n.avg.depth n.distance radius1500 1 41.44444 766.38426 3 2 71.31250 1051.20527 1 3 57.77778 1051.20527 1 4 37.70000 33.69099 3 5 41.44444 33.69099 3 6 37.70000 41.99576 3 7 29.22222 4149.07406 0
For obtaining the closest neighbor after calculating distance, you can use sort()
with the partial = 2
argument. Depending on the amount of data, this could be much faster than using order
as in the previous solution. The package Rfast
is likely even faster but I avoid including additional packages here. See this related post for a discussion and benchmarking of various solutions: https://stackoverflow.com/a/53144760/12265198
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With