Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I match varying postal addresses?

Tags:

c#

I have a requirement to match U.S. postal addresses during an import process. The problem is the address line could be typed in several different ways. Example:

123 Main Street

123 Main St.

123 Main St

How do I standardize an address so that I can do matching? We are importing 10,000 addresses at a time so I don't want to use a service like Google, Yahoo, or USPS. Is there an open source or commercial library for address standardization that is not a COM component? I don't care if the address is real or not, all I care about is the matching.

like image 550
Greg Finzer Avatar asked Sep 06 '12 18:09

Greg Finzer


1 Answers

This type of thing is very complex. Some companies are entirely based on providing this functionality.

I wouldn't recommend taking this on, there's existing libraries and services to do this:

https://www.usps.com/business/address-management-products.htm

http://smartystreets.com/products/liveaddress-api

If those aren't options, and if the referenced link (Address Match Key Algorithm) doesn't help you, you'll basically have to boil everything down to some common denominator. e.g. split the string up into constituent parts (street number, street number suffix, unit/suite number, street name, street type, and street direction). Then, convert all possible abbreviations for each (if applicable) to that common denominator. In the case of the street type "St.", you might choose "street" for the common denominator, in which case you'd convert "St." or "St" to "Street" then do any matching--assuming all data in your database contains "street" for that street type.

like image 143
Peter Ritchie Avatar answered Nov 10 '22 03:11

Peter Ritchie