How does the Google geocoder work?

Tags:

I am curious as to how the Google geocoder works.

I have been studying some implementations of open source geocoders like geocommons' geocoder or PostGIS's new Tiger Geocoder. This is roughly what I know so far (to hopefully prove that I have been doing my homework) :

I realize that at the core of the open source geocoders, there are three main elements.

1.- An address normalizer that takes an arbitrary string and normalizes it (taking the example from here):

normalize_address('address string');

e.g.: SELECT naddy.* FROM normalize_address('29645 7th Street SW Federal Way 98023') AS naddy;

 address | predirabbrev |      streetname       | streettypeabbrev | postdirabbrev | internal | location | stateabbrev |  zip  | parsed
 ---------+-------------+-----------------------+------------------+---------------+----------+----------+-------------+-------+--------
 29645 |               | 7th Street SW Federal  | Way              |               |          |          |             | 98023 |

and:

2.- A geocoder that does some magical fuzzy matching for names where the core algorithm is the Levenshtein Distance.

A good example is the one from the Wikipedia article where it calculates the Levenshtein distance between the words kitten and sitting (the distance is 3 since that is the number of edits required to change one string into the other):

kitten → sitten (substitution of 's' for 'k')
sitten → sittin (substitution of 'i' for 'e')
sittin → sitting (insertion of 'g' at the end).

3.- Some interpolation of the street segments at the end to guess where the house is. I downloaded a chunk of the free Census Tiger street dataset to create this example.

street interpolation example

In the example above, the street segment of interest (Schaeffer Hills Dr) has a from node that starts at 300 (so 300 Schaeffer Hills Dr) and a to node that ends on 400 (400 Schaeffer Hills Drv). If I matched to this Schaeffer Hills Drv, and the request was for street 310, then the algorithm would just interpolate to it (traverse 10% of it) to where my green arrow is.

This is what the Open Source geocoder tools do. Nevertheless, Google is clearly smarter than that and uses all kinds of non-traditional hints.

How so?

For example, I can type 680 Mission st (no city, state, county, anything at all). Most of the standard address normalizers would blow up because they would find too many matches. But since I am in SF, I am guessing google uses my ip to get some geoip-like information, does some expanding bounding as a hint with some fuzzy search, and right away finds the closest segment that matches and tells me that's my answer (which is correct!).

I am looking for answers that can shed some more light into how the Google geocoder works besides the techniques that I described above.

Update:

OK, so far we have two kinds of hints listed

Geoip as hints
Area of Interest Bounding Box (see Paul's example).
Others?

470

asked Jun 12 '12 06:06

rburhum

2 Answers

One of the things you can find by poking at the black box is that the Google geocoder isn't totally sensitive to the order of the tokens (there's no enforced street/city/state/country expectation, though it does better when you do follow that). Which says to me that they might be dumping everything into some kind of full text search and then seeing what comes back. Or perhaps not. Try searching "sault saint marie adams 200" and "sault saint marie 200 adams".

With respect to your Mission example, that's a great one, as you can see the map hint coming into play directly:

Query with map window over Europe: European results.

Query with map window over Europe: European results

Query with map window over North America: American results.

Query with map window over North America: American results

answered Sep 19 '22 12:09

Paul Ramsey

There is another source of data: county property maps. These don't just include roads, but also property lines (and their street addresses). You can often see this on Google's map, it will actually show faint lines that separate adjacent properties. Sometimes they even outline buildings (county maps often include these too).

You can also do the reverse lookup, given your GPS coordinates finding your exact address can be as simple as a 2D query to find which property polygon you're in. I've seen this work properly when I was physically far from the road but still inside the property and it returned the correct street address despite the handset being closer to another street.

Note that these maps tend to be public and some counties even have their own online interface. You can even look up who owns a particular plot.

answered Sep 16 '22 12:09

Adam

Related questions
                            
                                Add text on custom marker on google map for ios
                            
                                Why are the Google Maps controls & InfoWindow broken/skewed when using jQuery Mobile?
                            
                                Maps API keys for geocoding not working anymore?
                            
                                Python Google Maps Driving Time
                            
                                API Key not found when adding Google Maps to Flutter
                            
                                To Display the Google map on the emmulator
                            
                                AndroidDebugKey Keystore was tampered with, or password was incorrect [duplicate]
                            
                                Google Maps GMSMapView on custom UIView
                            
                                Could not find class 'com.google.android.gms.location.internal.ParcelableGeofence', referenced from method glt.a
                            
                                Why is the `sll` query parameter for a Maps URL being (sort of) ignored?
                            
                                How to set google map loading tiles color?
                            
                                Radar Images on Google Maps API [closed]
                            
                                Google maps accuracy issue
                            
                                Google Map API v3 Color Customization
                            
                                Determining approximate overlaps of a given polyline with a set of existing polylines
                            
                                Google maps - keep center when zooming
                            
                                Protocol "https" not supported or disabled in libcurl
                            
                                Try Catch for error message in google maps api

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does the Google geocoder work?

Tags:

google-maps

geocoding

google-geocoder

rburhum

People also ask

2 Answers

Paul Ramsey

Adam

Recent Activity

Donate For Us