Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect/Parse Mailing Addresses in Text

Tags:

parsing

nlp

Are there any open source/commercial libraries out there that can detect mailing addresses in text, just like how Apple's Mail app underlines addresses on the Mac/iPhone.

I've been doing a little online research and the ideas seem to be either to use Google, Regex or a full on NLP package such as Stanford's NLP, which usually are pretty massive. I doubt iPhone has a 500MB NLP package in there, or connects to Google every time you read an email. Which makes me to believe there should be an easier way. Too bad UIDataDetectors is not open source.

I know this question has been asked before, but there were no conclusive answers, so here's my try.

like image 717
Drew Avatar asked Mar 19 '26 20:03

Drew


2 Answers

As for Python you can try Pyap: https://pypi.python.org/pypi/pyap

It currently supports US and Canadian addresses

like image 143
Termos Avatar answered Mar 22 '26 22:03

Termos


Parsing addresses isn't a science. At my office we have been dealing with address parsing for years and the problem is that there aren't any rules about what constitutes a valid address. We use the USPS address database for cleaning addresses which is actually pretty fast and way more accurate than we were ever able to get on our own. It gets us 98% accuracy where as before we got about 90% cleaned addresses.

The bigger problem with address parsing tends to be that people don't input the address the same way. The same address might be in all the following forms.

128 E Beaumont St
128 East Beaumont Street
128 E Bmt St
128 Beaumont Street
128 Highway 88

The third one looks totally wrong but people will type that sometimes. Sometimes a street is also a highway. There are a bunch of possibilities. Just try to catch 90% and you accept that is as good as it gets for address parsing.

like image 36
Paul Mendoza Avatar answered Mar 22 '26 21:03

Paul Mendoza



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!