Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading GTFS-realtime files using R?

Tags:

I want to analyze GTFS-realtime files using R, compared to the static GTFS, these files are compiled and reading them is trickier.

Googling around, I have only found this package to deal with GTFS https://github.com/ropenscilabs/gtfsr

But again, this is just for static GTFS.

Are you aware of a cran/github R package that deals with GTFS-realtime?

An alternative solution would be to convert the GTFS-RT into a more readable format like json streaming gtfs real time data into human readable format

like image 861
Xavier Prudent Avatar asked Feb 09 '17 09:02

Xavier Prudent


2 Answers

I notice you already found your way over to my development package, gtfsway. In particular, the example in issue 1 gives an example of how the package works and the way it parses a realtime feed

devtools::install_github("SymbolixAU/gtfsway")
library(gtfsway)
url <- "https://gtfsrt.api.translink.com.au/Feed/SEQ"

response <- httr::GET(url)

FeedMessage <- gtfs_realtime(response)

## the function gtfs_tripUpdates() extracts the 'trip_update' feed
lst <- gtfs_tripUpdates(FeedMessage)  

## The results will obviously change depending on when you read the data
lst[[32]]
# $dt_trip_info
# trip_id start_time start_date route_id
# 1: 8959814-SBL 16_17-SBL_FUL-Friday-04   12:21:00   20170303  709-739
# 
# $dt_stop_time_update
#     stop_sequence stop_id arrival_time arrival_delay departure_time departure_delay
#  1:             1  318944   1488504104         -3556     1488507660               0
#  2:             2  318946   1488507741            21     1488507741              21
#  3:             3  300444   1488507903             3     1488507903               3
#  4:             4  300058   1488507977            17     1488507977              17
#  5:             5  300059   1488508022             2     1488508022               2
#  6:             6  300060   1488508094           -46     1488508094             -46
#  7:             7  300061   1488508115           -25     1488508115             -25
#  8:             8  300062   1488508148           -52     1488508148             -52
#  9:             9  300063   1488508175           -85     1488508175             -85
# 10:            10  300005   1488508299          -141     1488508299            -141
# 11:            11  300053   1488508398          -102     1488508398            -102
# 12:            12  300054   1488508458          -102     1488508458            -102
# 13:            13  300056   1488508638          -102     1488508638            -102
# 14:            14  300055   1488508758          -102     1488508758            -102
# 15:            15  300272   1488508998          -102     1488508998            -102
# 16:            16  319160   1488509058          -102     1488509058            -102

I'm open to contributions & suggestions on the package if you have any.

like image 192
SymbolixAU Avatar answered Sep 23 '22 10:09

SymbolixAU


The GTFS realtime feeds are binary Protocol Buffers, that can be processed by the RProtoBuf package.

A simple worked example using my local South-east Queensland Translink feed:

library(RProtoBuf)

Load the actual proto file which specifies the format the feed files actually follow:

download.file(url="https://gtfsrt.api.translink.com.au/api/realtime/protobuf", destfile="translink-gtfs-realtime.proto")
readProtoFiles("translink-gtfs-realtime.proto")

Check all the 'Descriptors' that are now available for loading feeds in the 'Descriptor Pool'

ls("RProtoBuf:DescriptorPool")
## [1] "GTFSv2.Realtime.Alert"             "GTFSv2.Realtime.EntitySelector"   
## [3] "GTFSv2.Realtime.FeedEntity"        "GTFSv2.Realtime.FeedHeader"       
## [5] "GTFSv2.Realtime.FeedMessage"       "GTFSv2.Realtime.Position"
## ...

Read the actual feeds - stored in the 'FeedMessage'/'entity' in this case

download.file(url="https://gtfsrt.api.translink.com.au/api/realtime/SEQ/TripUpdates", destfile="SEQ-TripUpdates.pb")
download.file(url="https://gtfsrt.api.translink.com.au/api/realtime/SEQ/VehiclePositions", destfile="SEQ-VehiclePositions.pb")

vehicle_position_feed <- read(GTFSv2.Realtime.FeedMessage,  "SEQ-VehiclePositions.pb")[["entity"]]
trip_update_feed  <- read(GTFSv2.Realtime.FeedMessage,  "SEQ-TripUpdates.pb")[["entity"]]

When read, each object is just a set of pointers to parts of the binary file:

str(vehicle_position_feed)
##List of 6
## $ :Formal class 'Message' [package "RProtoBuf"] with 2 slots
##  .. ..@ pointer:<externalptr> 
##  .. ..@ type   : chr "GTFSv2.Realtime.FeedEntity"
## $ :Formal class 'Message' [package "RProtoBuf"] with 2 slots
##  .. ..@ pointer:<externalptr> 
##  .. ..@ type   : chr "GTFSv2.Realtime.FeedEntity"
## .. 

You can then extract info from each data point by looping over the file to construct datasets to work with, e.g.:

data.frame(
  id = sapply(vehicle_position_feed, \(x) x[["id"]] ),
  latitude = sapply(vehicle_position_feed, \(x) x[["vehicle"]][["position"]][["latitude"]] ),
  longitude = sapply(vehicle_position_feed, \(x) x[["vehicle"]][["position"]][["longitude"]] )
)
##                   id  latitude longitude
##1     VU-2123549587_1 -27.06561  153.1595
##2    VU-1176076363_10 -27.30158  152.9881
##3   VU--1272517086_10 -27.49080  153.2397
## ...
like image 39
thelatemail Avatar answered Sep 21 '22 10:09

thelatemail