Are there any open source/commercial libraries out there that can detect mailing addresses in text, just like how Apple's Mail app underlines addresses on the Mac/iPhone.
I've been doing a little online research and the ideas seem to be either to use Google, Regex or a full on NLP package such as Stanford's NLP, which usually are pretty massive. I doubt iPhone has a 500MB NLP package in there, or connects to Google every time you read an email. Which makes me to believe there should be an easier way. Too bad UIDataDetectors is not open source.
I know this question has been asked before, but there were no conclusive answers, so here's my try.
As for Python you can try Pyap: https://pypi.python.org/pypi/pyap
It currently supports US and Canadian addresses
Parsing addresses isn't a science. At my office we have been dealing with address parsing for years and the problem is that there aren't any rules about what constitutes a valid address. We use the USPS address database for cleaning addresses which is actually pretty fast and way more accurate than we were ever able to get on our own. It gets us 98% accuracy where as before we got about 90% cleaned addresses.
The bigger problem with address parsing tends to be that people don't input the address the same way. The same address might be in all the following forms.
128 E Beaumont St
128 East Beaumont Street
128 E Bmt St
128 Beaumont Street
128 Highway 88
The third one looks totally wrong but people will type that sometimes. Sometimes a street is also a highway. There are a bunch of possibilities. Just try to catch 90% and you accept that is as good as it gets for address parsing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With