Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating trip travel times using available geo APIs for 5k+ addresses

I'm working on a transportation model, and am about to do a travel time matrix between 5,000 points. Is there a free, semi-reliable way to calculate the travel times between all my nodes?

I think google maps has a limit on the number of queries / hits I can achieve.

EDIT

I'd like to use an api such as google maps or similar ones as they include data such as road directions, number of lanes, posted speed, type of road, etc ...

EDIT 2

Please be advised that openstreet map data is incomplete and not available for all jurisdictions outside the US

like image 258
dassouki Avatar asked Jul 13 '10 19:07

dassouki


4 Answers

Google Directions API restricts you to 2500 calls per day. Additionally, terms of service stipulate that you must only use the service "in conjunction with displaying the results on a Google map".

You may be interested in OpenTripPlanner, an in-development project which can do multi-modal routing, and Graphserver on which OpenTripPlanner is built.

One approach would be to use OpenStreetMap data with Graphserver to generate Shortest Path Trees from each node.

like image 50
tcarobruce Avatar answered Nov 06 '22 13:11

tcarobruce


As that's 12,502,500 total connections, I'm pretty sure you'll hit some sort of limit if you attempt to use Google maps for all of them. How accurate of results do you need/how far are you travelling?

I might try to generate a crude map with travel speeds on it (e.g. mark off interstates as fast, yadda yadda) then use some software to calculate how long it would take from point to point. One could visualize it as an electromagnetic fields problem, where you're trying to calculate the resistance from point to point over a plane with varying resistance (interstates are wires, lakes are open circuits...).

like image 39
Nick T Avatar answered Nov 06 '22 13:11

Nick T


If you really need all these routes accurately calculated and stored in your database, it sounds like (and I would believe) that you are going to have to spend the money to obtain this. As you can imagine, this is expensive to develop and there should be renumeration.

I would, however, probe a bit about your problem:

  • Do you really need all 5000! distances in a database? What if you asked google for them as you needed them, and then cached them (if allowed). I've had web applications like this that because of the slow traffic ramp-up pattern, I was able to leverage free services early on to vet the idea.
  • Do you really need all 5000 points? Or could you pick the top 100 and have a more tractable problem?
  • Perhaps there is some hybrid where you store distances between big cities and do more estimates for shorter distances.

Again, I really don't know what your problem is, but maybe thinking a bit outside the box will help you find an easier solution.

like image 1
ndp Avatar answered Nov 06 '22 13:11

ndp


You might have to go for some heuristics here. Maybe you can estimate travel time based on a few factors like geometric distance and some features about the start and end points (urban vs rural areas, country, ...). You could get a few distances, try to fit your parameters on a subset of them and see how well you're able to predict the other ones. My prediction would be, for example, that travel times approach linear dependence from distance as distance grows larger, in many cases.

I know it's messy, but hey you're trying to estimate 12.5mio datapoints (or whatever the amount :)

You might also be able to incrementally add knowledge from already-retrieved "real" travel times by finding close points to the ones you're looking for:

  • get closest points StartApprox, EndApprox to starting and end position such that you have a travel time between StartApprox and EndApprox
  • compute distances StartError, EndError between start and StartApprox, end and EndApprox
  • if StartError+EndError>Distance(StartApprox, EndApprox) * 0.10 (or whatever your threshold) -> compute distance via API (and store it), else use known travel time plus overhead time based on StartError+EndError

(if you have 100 addresses in NY and 100 in SF, all the values are going to be more or less the same (ie the difference between them is probably lower than the uncertainty involved in these predictions) and such an approach would keep you from issuing 10000 queries where 1 would do)

like image 1
Nicolas78 Avatar answered Nov 06 '22 12:11

Nicolas78