Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate distance longitude latitude of multiple in dataframe R

I'm not sure if this is the right place to ask my question (I'm new to R and this site). My question is the following: how can I calculate the distance between longitude and latitude points?

I searched at this site for an answer to my problem, but the answers only considered 2 points (while I have a data set containing more than 207000 rows).

I have a dataframe called 'adsb_relevant_columns_correct_timedifference' containing the following columns: Callsign, Altitude, Speed, Direction, Date_Time, Latitude, Longitude.

Callsign Altitude  Speed Direction   Date_Time             Latitude     Longitude
A118       18000    110  340         2017-11-06 22:28:09   70.6086      58.2959
A118       18500    120  339         2017-11-06 22:29:09   72.1508      58.7894
B222       18500    150  350         2017-11-08 07:28:09   71.1689      59.1234
D123       19000    150  110         2018-05-29 15:13:27   69.4523      68.1235

I would like to calculate the distance (in meters) between each measurement (each row is a new measurement)and add this as a new column called 'Distance'. This first distance calculation should come on the second row because for later purposes. Therefore, the first row of column 'Distance' can be zero or NA, that does not matter.

So, I would like to know to distance between the first measurement (Lat 70.6086 and Long 58.2959) and the second measurement (Lat 72.1508 and 58.7894). Then the distance between the second and the third measurement. Then between the third and the fourth, and so on for more than 207000 measurements.

The expected output should be like this:

Callsign Altitude  Speed Direction   Date_Time             Latitude     Longitude  Distance 
A118       18000    110  340         2017-11-06 22:28:09   70.6086      58.2959    NA  
A118       18500    120  339         2017-11-06 22:29:09   72.1508      58.7894    172000
B222       18500    150  350         2017-11-08 07:28:09   71.1689      59.1234    110000
D123       19000    150  110         2018-05-29 15:13:27   69.4523      68.1235    387000

I found the distm function in R, for which I can do it manually for only two measurements instead of the complete dataset.

distm(p1, p2, fun = distHaversine)

I tried the following code

adsb_relevant_columns_correct_timedifference <- mutate(adsb_relevant_columns_correct_timedifference, Distance =
distm(c(adsb_relevant_columns_correct_timedifference$Longitude, adsb_relevant_columns_correct_timedifference$Latitude),
      c(lag(adsb_relevant_columns_correct_timedifference$Longitude, adsb_relevant_columns_correct_timedifference$Latitude)), fun = distCosine))

However, I got the following error

Error in mutate_impl(.data, dots) : Evaluation error: Wrong length for a vector, should be 2.

I'm sorry for my long explanation, but I hope that my question is clear. Can someone please tell me how to calculate the distance between the several measurements and add this as a new column to my dataframe?

like image 651
Arjan Avatar asked Mar 28 '18 11:03

Arjan


People also ask

How do you find the distance between longitude lines?

Note I’ve included a scale bar, but of course the distance between longitude lines gets closer at higher latitudes. The first method is to calculate great circle distances, that account for the curvature of the earth. If we use with unprojected coordinates (ie in lon-lat) then we get great circle distances (in metres).

What is the most accurate way to calculate the distance between points?

The first method (great circle) is the more accurate one, but is also a bit slower. The Euclidean distances become a bit inaccurate for point 1, because it is so far outside the zone of the UTM projection. Points 2 & 3 are within the UTM zone, so the distance between these points is almost identical to the great circle calculation.

How do you calculate the distance between two points on a grid?

The basic idea here is that we turn the data into a raster grid and then use the gridDistance() gridDistance () function to calculate distances around barriers (land) between points.

How do you find the distance between points in a matrix?

with unprojected coordinates (ie in lon-lat) then we get great circle distances (in metres). The matrix m gives the distances between points (we divided by 1000 to get distances in KM).


1 Answers

Instead of distm you can use the distHaversine-function. Further in your mutate call you should not repeat the dataframe and use the $ operator, mutate already nows where to look for the columns. The error occurs because you need to use cbind instead of c, as c creates one long vector, simply stacking the columns together, whereas cbind creates a dataframe with two columns (what you want to have in this case).

library(geosphere)
library(dplyr)

mutate(mydata, 
       Distance = distHaversine(cbind(Longitude, Latitude),
                                cbind(lag(Longitude), lag(Latitude))))

#   Callsign Altitude Speed Direction           Date_Time Latitude Longitude Distance
# 1     A118    18000   110       340 2017-11-06T22:28:09  70.6086   58.2959       NA
# 2     A118    18500   120       339 2017-11-06T22:29:09  72.1508   58.7894 172569.2
# 3     B222    18500   150       350 2017-11-08T07:28:09  71.1689   59.1234 109928.5
# 4     D123    19000   150       110 2018-05-29T15:13:27  69.4523   68.1235 387356.2

With distCosine it is a little bit more tricky, as it doesn't return NA if one of the input latitudes or longitudes is missing. Thus I modified the function a little bit and this solves the problem:

modified_distCosine <- function(Longitude1, Latitude1, Longitude2, Latitude2) {
  if (any(is.na(c(Longitude1, Latitude1, Longitude2, Latitude2)))) {
    NA
  } else {
    distCosine(c(Longitude1, Latitude1), c(Longitude2, Latitude2))
  }
}

mutate(mydata, 
       Distance = mapply(modified_distCosine, 
                         Longitude, Latitude, lag(Longitude), lag(Latitude)))

#   Callsign Altitude Speed Direction           Date_Time Latitude Longitude Distance
# 1     A118    18000   110       340 2017-11-06T22:28:09  70.6086   58.2959       NA
# 2     A118    18500   120       339 2017-11-06T22:29:09  72.1508   58.7894 172569.2
# 3     B222    18500   150       350 2017-11-08T07:28:09  71.1689   59.1234 109928.5
# 4     D123    19000   150       110 2018-05-29T15:13:27  69.4523   68.1235 387356.2

Here I use mapply to apply the modified function with the arguments Longitude, Latitude, lag(Longitude), lag(Latitude).
I'm quite sure there has to be a more elegant way, but at least this works.

Data

mydata <- structure(list(Callsign = c("A118", "A118", "B222", "D123"), 
                         Altitude = c(18000L, 18500L, 18500L, 19000L), 
                         Speed = c(110L, 120L, 150L, 150L), 
                         Direction = c(340L, 339L, 350L, 110L), 
                         Date_Time = c("2017-11-06T22:28:09", "2017-11-06T22:29:09", "2017-11-08T07:28:09", "2018-05-29T15:13:27"), 
                         Latitude = c(70.6086, 72.1508, 71.1689, 69.4523), 
                         Longitude = c(58.2959, 58.7894, 59.1234, 68.1235)), 
                    .Names = c("Callsign", "Altitude", "Speed", "Direction", "Date_Time", "Latitude", "Longitude"), 
                    class = "data.frame", row.names = c(NA, -4L))
like image 142
kath Avatar answered Oct 04 '22 19:10

kath