I'm not sure if this is the right place to ask my question (I'm new to R and this site). My question is the following: how can I calculate the distance between longitude and latitude points?
I searched at this site for an answer to my problem, but the answers only considered 2 points (while I have a data set containing more than 207000 rows).
I have a dataframe called 'adsb_relevant_columns_correct_timedifference' containing the following columns: Callsign, Altitude, Speed, Direction, Date_Time, Latitude, Longitude.
Callsign Altitude Speed Direction Date_Time Latitude Longitude
A118 18000 110 340 2017-11-06 22:28:09 70.6086 58.2959
A118 18500 120 339 2017-11-06 22:29:09 72.1508 58.7894
B222 18500 150 350 2017-11-08 07:28:09 71.1689 59.1234
D123 19000 150 110 2018-05-29 15:13:27 69.4523 68.1235
I would like to calculate the distance (in meters) between each measurement (each row is a new measurement)and add this as a new column called 'Distance'. This first distance calculation should come on the second row because for later purposes. Therefore, the first row of column 'Distance' can be zero or NA, that does not matter.
So, I would like to know to distance between the first measurement (Lat 70.6086 and Long 58.2959) and the second measurement (Lat 72.1508 and 58.7894). Then the distance between the second and the third measurement. Then between the third and the fourth, and so on for more than 207000 measurements.
The expected output should be like this:
Callsign Altitude Speed Direction Date_Time Latitude Longitude Distance
A118 18000 110 340 2017-11-06 22:28:09 70.6086 58.2959 NA
A118 18500 120 339 2017-11-06 22:29:09 72.1508 58.7894 172000
B222 18500 150 350 2017-11-08 07:28:09 71.1689 59.1234 110000
D123 19000 150 110 2018-05-29 15:13:27 69.4523 68.1235 387000
I found the distm function in R, for which I can do it manually for only two measurements instead of the complete dataset.
distm(p1, p2, fun = distHaversine)
I tried the following code
adsb_relevant_columns_correct_timedifference <- mutate(adsb_relevant_columns_correct_timedifference, Distance =
distm(c(adsb_relevant_columns_correct_timedifference$Longitude, adsb_relevant_columns_correct_timedifference$Latitude),
c(lag(adsb_relevant_columns_correct_timedifference$Longitude, adsb_relevant_columns_correct_timedifference$Latitude)), fun = distCosine))
However, I got the following error
Error in mutate_impl(.data, dots) : Evaluation error: Wrong length for a vector, should be 2.
I'm sorry for my long explanation, but I hope that my question is clear. Can someone please tell me how to calculate the distance between the several measurements and add this as a new column to my dataframe?
Note I’ve included a scale bar, but of course the distance between longitude lines gets closer at higher latitudes. The first method is to calculate great circle distances, that account for the curvature of the earth. If we use with unprojected coordinates (ie in lon-lat) then we get great circle distances (in metres).
The first method (great circle) is the more accurate one, but is also a bit slower. The Euclidean distances become a bit inaccurate for point 1, because it is so far outside the zone of the UTM projection. Points 2 & 3 are within the UTM zone, so the distance between these points is almost identical to the great circle calculation.
The basic idea here is that we turn the data into a raster grid and then use the gridDistance() gridDistance () function to calculate distances around barriers (land) between points.
with unprojected coordinates (ie in lon-lat) then we get great circle distances (in metres). The matrix m gives the distances between points (we divided by 1000 to get distances in KM).
Instead of distm
you can use the distHaversine
-function. Further in your mutate
call you should not repeat the dataframe and use the $
operator, mutate
already nows where to look for the columns. The error occurs because you need to use cbind
instead of c
, as c
creates one long vector, simply stacking the columns together, whereas cbind
creates a dataframe with two columns (what you want to have in this case).
library(geosphere)
library(dplyr)
mutate(mydata,
Distance = distHaversine(cbind(Longitude, Latitude),
cbind(lag(Longitude), lag(Latitude))))
# Callsign Altitude Speed Direction Date_Time Latitude Longitude Distance
# 1 A118 18000 110 340 2017-11-06T22:28:09 70.6086 58.2959 NA
# 2 A118 18500 120 339 2017-11-06T22:29:09 72.1508 58.7894 172569.2
# 3 B222 18500 150 350 2017-11-08T07:28:09 71.1689 59.1234 109928.5
# 4 D123 19000 150 110 2018-05-29T15:13:27 69.4523 68.1235 387356.2
With distCosine
it is a little bit more tricky, as it doesn't return NA
if one of the input latitudes or longitudes is missing. Thus I modified the function a little bit and this solves the problem:
modified_distCosine <- function(Longitude1, Latitude1, Longitude2, Latitude2) {
if (any(is.na(c(Longitude1, Latitude1, Longitude2, Latitude2)))) {
NA
} else {
distCosine(c(Longitude1, Latitude1), c(Longitude2, Latitude2))
}
}
mutate(mydata,
Distance = mapply(modified_distCosine,
Longitude, Latitude, lag(Longitude), lag(Latitude)))
# Callsign Altitude Speed Direction Date_Time Latitude Longitude Distance
# 1 A118 18000 110 340 2017-11-06T22:28:09 70.6086 58.2959 NA
# 2 A118 18500 120 339 2017-11-06T22:29:09 72.1508 58.7894 172569.2
# 3 B222 18500 150 350 2017-11-08T07:28:09 71.1689 59.1234 109928.5
# 4 D123 19000 150 110 2018-05-29T15:13:27 69.4523 68.1235 387356.2
Here I use mapply
to apply the modified function with the arguments Longitude, Latitude, lag(Longitude), lag(Latitude)
.
I'm quite sure there has to be a more elegant way, but at least this works.
Data
mydata <- structure(list(Callsign = c("A118", "A118", "B222", "D123"),
Altitude = c(18000L, 18500L, 18500L, 19000L),
Speed = c(110L, 120L, 150L, 150L),
Direction = c(340L, 339L, 350L, 110L),
Date_Time = c("2017-11-06T22:28:09", "2017-11-06T22:29:09", "2017-11-08T07:28:09", "2018-05-29T15:13:27"),
Latitude = c(70.6086, 72.1508, 71.1689, 69.4523),
Longitude = c(58.2959, 58.7894, 59.1234, 68.1235)),
.Names = c("Callsign", "Altitude", "Speed", "Direction", "Date_Time", "Latitude", "Longitude"),
class = "data.frame", row.names = c(NA, -4L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With