I have two data sets, A and B, which give locations of different points in the UK as such:
A = data.frame(reference = c(C, D, E), latitude = c(55.32043, 55.59062, 55.60859), longitude = c(-2.3954998, -2.0650243, -2.0650542))
B = data.frame(reference = c(C, D, E), latitude = c(55.15858, 55.60859, 55.59062), longitude = c(-2.4252843, -2.0650542, -2.0650243))
A has 400 rows and B has 1800 rows.
For all the rows in A, I would like to find the shortest distance in kilometers between a point in A and each of the three closest points in B, as well as the reference and coordinates in lat and long of these points in B.
I tried using this post
R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long
However, even when I follow all the instructions, mainly using the command distm
from the package geosphere
, the distance comes up in a unit that can't possibly be kilometers. I don't see what to change in the code, especially since I am not familiar at all with the geo
packages.
The closest pair is the minimum of the closest pairs within each half and the closest pair between the two halves. To split the point set in two, we find the x-median of the points and use that as a pivot. Finding the closest pair of points in each half is subproblem that is solved recursively.
For this divide the values of longitude and latitude of both the points by 180/pi. The value of pi is 22/7. The value of 180/pi is approximately 57.29577951. If we want to calculate the distance between two places in miles, use the value 3, 963, which is the radius of Earth.
Another way to calculate distances in Excel is to use the built-in Distance function. This function takes two sets of coordinates and returns the distance between them.
Here is the formula to find the second point, when first point, bearing and distance is known: latitude of second point = la2 = asin(sin la1 * cos Ad + cos la1 * sin Ad * cos θ), and. longitude of second point = lo2 = lo1 + atan2(sin θ * sin Ad * cos la1 , cos Ad – sin la1 * sin la2)
I add below a solution using the spatialrisk
package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.
The function spatialrisk::points_in_circle
calculates the observations within radius from a center point. Note that distances are calculated using the Haversine formula. Since each element of the output is a data frame, purrr::map_dfr
is used to row-bind them together:
purrr::map2_dfr(A$latitude, A$longitude,
~spatialrisk::points_in_circle(B, .y, .x,
lon = longitude,
lat = latitude,
radius = 1e6)[1:3,],
.id = "id_A")
id_A reference latitude longitude distance_m
1 1 C 55.15858 -2.425284 18115.958
2 1 E 55.59062 -2.065024 36603.447
3 1 D 55.60859 -2.065054 38260.562
4 2 E 55.59062 -2.065024 0.000
5 2 D 55.60859 -2.065054 2000.412
6 2 C 55.15858 -2.425284 53219.597
7 3 D 55.60859 -2.065054 0.000
8 3 E 55.59062 -2.065024 2000.412
9 3 C 55.15858 -2.425284 55031.092
Here is solution using a single loop and vectorizing the distance calculation (converted to km).
The code is using base R's rank
function to order/sort the list of calculated distances.
The indexes and the calculated distances of the 3 shortest values are store back in data frame A.
library(geosphere)
A = data.frame(longitude = c(-2.3954998, -2.0650243, -2.0650542), latitude = c(55.32043, 55.59062, 55.60859))
B = data.frame(longitude = c(-2.4252843, -2.0650542, -2.0650243), latitude = c(55.15858, 55.60859, 55.59062))
for(i in 1:nrow(A)){
#calucate distance against all of B
distances<-geosphere::distGeo(A[i,], B)/1000
#rank the calculated distances
ranking<-rank(distances, ties.method = "first")
#find the 3 shortest and store the indexes of B back in A
A$shortest[i]<-which(ranking ==1) #Same as which.min()
A$shorter[i]<-which(ranking==2)
A$short[i]<-which(ranking ==3)
#store the distances back in A
A$shortestD[i]<-distances[A$shortest[i]] #Same as min()
A$shorterD[i]<-distances[A$shorter[i]]
A$shortD[i]<-distances[A$short[i]]
}
A
longitude latitude shortest shorter short shortestD shorterD shortD
1 -2.395500 55.32043 1 3 2 18.11777 36.633310 38.28952
2 -2.065024 55.59062 3 2 1 0.00000 2.000682 53.24607
3 -2.065054 55.60859 2 3 1 0.00000 2.000682 55.05710
As M Viking pointed out, for the geosphere package the data must be arranged Lon then Lat.
geosphere
library has several functions to help you. distGeo
returns meters.
Note the data must be arranged Lon
then Lat
.
library(geosphere)
A = data.frame(longitude = c(-2.3954998, -2.0650243, -2.0650542), latitude = c(55.32043, 55.59062, 55.60859))
B = data.frame(longitude = c(-2.4252843, -2.0650542, -2.0650243), latitude = c(55.15858, 55.60859, 55.59062))
geosphere::distGeo(A, B)
# > geosphere::distGeo(A, B)
# [1] 18117.765 2000.682 2000.682
Vector of distances in meters
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With