I have a data frame with data about a driver and the route they followed. I'm trying to figure out the total mileage traveled. I'm using the geosphere
package but can't figure out the correct way to apply it and get an answer in miles.
> head(df1)
id routeDateTime driverId lat lon
1 1 2012-11-12 02:08:41 123 76.57169 -110.8070
2 2 2012-11-12 02:09:41 123 76.44325 -110.7525
3 3 2012-11-12 02:10:41 123 76.90897 -110.8613
4 4 2012-11-12 03:18:41 123 76.11152 -110.2037
5 5 2012-11-12 03:19:41 123 76.29013 -110.3838
6 6 2012-11-12 03:20:41 123 76.15544 -110.4506
so far I've tried
spDists(cbind(df1$lon,df1$lat))
and several other functions but can't seem to get a reasonable answer.
Any suggestions?
> dput(df1)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), routeDateTime = c("2012-11-12 02:08:41",
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41",
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41",
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41",
"2012-11-12 02:08:41", "2012-11-12 02:09:41", "2012-11-12 02:10:41",
"2012-11-12 03:18:41", "2012-11-12 03:19:41", "2012-11-12 03:20:41",
"2012-11-12 03:21:41", "2012-11-12 12:08:41", "2012-11-12 12:09:41",
"2012-11-12 12:10:41", "2012-11-12 02:08:41", "2012-11-12 02:09:41",
"2012-11-12 02:10:41", "2012-11-12 03:18:41", "2012-11-12 03:19:41",
"2012-11-12 03:20:41", "2012-11-12 03:21:41", "2012-11-12 12:08:41",
"2012-11-12 12:09:41", "2012-11-12 12:10:41", "2012-11-12 02:08:41",
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41",
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41",
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41"
), driverId = c(123, 123, 123, 123, 123, 123, 123, 123, 123,
123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 789, 789,
789, 789, 789, 789, 789, 789, 789, 789, 246, 246, 246, 246, 246,
246, 246, 246, 246, 246), lat = c(76.5716897079255, 76.4432530414779,
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499,
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785,
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383,
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343,
76.3357809444424, 76.032417796785, 76.5716897079255, 76.4432530414779,
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499,
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785,
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383,
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343,
76.3357809444424, 76.032417796785), lon = c(-110.80701574916,
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505,
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522,
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726,
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111,
-110.556956546381, -110.24483308522, -110.217355202651, -110.80701574916,
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505,
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522,
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726,
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111,
-110.556956546381, -110.24483308522, -110.217355202651)), .Names = c("id",
"routeDateTime", "driverId", "lat", "lon"), row.names = c(NA,
-40L), class = "data.frame")
How about this?
## Setup
library(geosphere)
metersPerMile <- 1609.34
pts <- df1[c("lon", "lat")]
## Pass in two derived data.frames that are lagged by one point
segDists <- distVincentyEllipsoid(p1 = pts[-nrow(df),],
p2 = pts[-1,])
sum(segDists)/metersPerMile
# [1] 1013.919
(To use one of the faster distance calculation algorithms, just substitute distCosine
, distVincentySphere
, or distHaversine
for distVincentyEllipsoid
in the call above.)
Be VERY careful with missing data, as distVincentyEllipsoid() returns 0 for distance between any two points with missing coordinates c(NA, NA), c(NA, NA).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With