Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need an address matching algorithm

I have looked around online for this but haven't found much really. Basically I need to compare a bunch of addresses to see if they match. The addresses could be written in all different ways. For Example : 1345 135th st NE, 1345 NE 135TH ST, etc. Plus they could be in different languages as well. Before I attempt to write some parsing matching algorithm on my own does anyone know any libraries or ways I could easily do this? My friend though of using google or bing maps web service and passing them the address and getting back the geo-coordinates and comparing using the coordinates instead of string matching. But then I have to call a web service thousands of times for all these addresses I have, not very elegant ;) Any help would be nice :)

like image 619
Kyle Avatar asked May 20 '11 22:05

Kyle


2 Answers

US addresses can (usually) be uniquely represented by a 12-digit number called the delivery point (DPBC). This number consists of the full 9-digit ZIP Code and a 3 digit delivery point number. This is what is used to form barcodes on mail pieces to speed up delivery. Using a service that is CASS-Certified can provide the 12-digit delivery point and even flag duplicates for you.

In the interest of full disclosure I work for SmartyStreets, which was formerly Qualified Address, which was mentioned in the other answer by Mowgli.

We provide an API that can be queried as well as a batch processing service (which will flag duplicates as explained above).

Keep in mind that even the 12-digit DPBC doesn't always uniquely identify a particular address. This happens frequently when a particular street block, or 9-digit ZIP code, has a long stretch of homes that have similar primary numbers. In these cases, it's best to use a CASS service to standardize and validate the addresses, then hash them for convenient comparisons. (But as said, duplicates will already be flagged by some CASS services.)

Update: SmartyStreets now provides international address verification.

like image 64
mdwhatcott Avatar answered Sep 20 '22 17:09

mdwhatcott


I don't think that this is a REGEX type of problem. You are looking at converting to a comparable format first.

There are several web services / products available that will standardize an address for you. Bing for "USPS Address Standardization API" and you will find a ton of information. Once the address is standardized, the comparison should be straightforward.

http://www.bing.com/search?q=usps+address+standardization+api&go=&form=QBRE&qs=n&sk=&sc=1-32

Alternatively you can GeoCode the address to get a set of coordinates and then compare those.

http://code.google.com/apis/maps/documentation/geocoding/

like image 39
Raj More Avatar answered Sep 20 '22 17:09

Raj More