Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a library for parsing US addresses?

Tags:

I have a list of US addresses I need to break into city,state, zip code,state etc.

example address : "16100 Sand Canyon Avenue, Suite 380 Irvine, CA 92618"

Does anyone know of a library or a free API to do this? Google/Yahoo geocoder is forbidden to use by the TOS for commercial projects..

It would be awesome to find a python library that preforms this.

like image 774
WeaselFox Avatar asked Feb 27 '12 10:02

WeaselFox


People also ask

How do you parse an address?

The easiest way to parse an address is by applying a Regex. This method really proves itself when you have regular form addresses. For example, if all the address strings are like STREET_NAME XX, YYYYYY CITY_NAME, you can select a regexp that will split the strings to [STREET_NAME, XX, YYYYYY, CITY_NAME].

What is Libpostal?

libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere.

How do you compare addresses in Python?

Method 1: Using deterministic address matching One of the most basic ways to match addresses using Python is by comparing two strings for an exact match. It's important to note that this won't account for spelling mistakes, missing words, and when parts of the address are entered in different orders.


1 Answers

Quite a few of these answers are a few years old now.

The most bulletproof library I've seen recently is usaddress: https://github.com/datamade/usaddress:

  • Far more accurate than address which we'd been using for a year now https://pypi.python.org/pypi/address/0.1.1.
  • Yet to see it fail on an address
  • Still being committed to as of this writing

Pro tip: when testing addresses in all these libraries, use 1) no commas in your address, 2) multi-word city names preferably with "St." in the name to see if the library can differentiate between "street" and "Saint" (e.g., St. Louis), and 3) improper casing. This combo will typically make even the better parsers fall down.

like image 106
Tyler Hayes Avatar answered Oct 31 '22 22:10

Tyler Hayes