I have a list of US addresses I need to break into city,state, zip code,state etc.
example address : "16100 Sand Canyon Avenue, Suite 380 Irvine, CA 92618"
Does anyone know of a library or a free API to do this? Google/Yahoo geocoder is forbidden to use by the TOS for commercial projects..
It would be awesome to find a python library that preforms this.
The easiest way to parse an address is by applying a Regex. This method really proves itself when you have regular form addresses. For example, if all the address strings are like STREET_NAME XX, YYYYYY CITY_NAME, you can select a regexp that will split the strings to [STREET_NAME, XX, YYYYYY, CITY_NAME].
libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere.
Method 1: Using deterministic address matching One of the most basic ways to match addresses using Python is by comparing two strings for an exact match. It's important to note that this won't account for spelling mistakes, missing words, and when parts of the address are entered in different orders.
Quite a few of these answers are a few years old now.
The most bulletproof library I've seen recently is usaddress
: https://github.com/datamade/usaddress:
address
which we'd been using for a year now https://pypi.python.org/pypi/address/0.1.1.Pro tip: when testing addresses in all these libraries, use 1) no commas in your address, 2) multi-word city names preferably with "St." in the name to see if the library can differentiate between "street" and "Saint" (e.g., St. Louis), and 3) improper casing. This combo will typically make even the better parsers fall down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With