Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Java parser that can parse addresses like this [duplicate]

I'm using Java 6. I'm looking for an automated way to parse addresses. I'm not concerned if the addresses exist or not. The best thing I have found is JGeocoder (v 0.4.1), but JGeocoder is unable to parse addresses like this

16th Street Theater, Berwyn Cultural Center,  6420 16th St.

Does anyone know of a free Java address parser that is up to the challenge? By "parse" I mean the ability to distinguish street, city, state, postal code, and potentially the venue name (the above venue name is "16th Street Theater, Berwyn Cultural Center").

like image 703
Dave Avatar asked Apr 13 '12 19:04

Dave


People also ask

How do I parse an address string?

The easiest way to parse an address is by applying a Regex. This method really proves itself when you have regular form addresses. For example, if all the address strings are like STREET_NAME XX, YYYYYY CITY_NAME, you can select a regexp that will split the strings to [STREET_NAME, XX, YYYYYY, CITY_NAME].

What is parser parse in Java?

A parser is a Java class that extracts attributes from a local file and stores the information in the repository. More specifically, in the case of a document, a parser: Takes in an InputStream or Reader object. Processes the character input, extracting attributes as it goes.

What does it mean to parse an address?

The address is parsed – so household number, address, abbreviations, mis-spellings, etc. are logically separated. The address is standardized – once parsed, the address is then reformatted to a standard.

What is parse in Java with example?

Parsing is to read the value of one object to convert it to another type. For example you may have a string with a value of "10". Internally that string contains the Unicode characters '1' and '0' not the actual number 10. The method Integer. parseInt takes that string value and returns a real number.


1 Answers

Update: This topic is more exhaustively covered in this StackOverflow question.


I work for SmartyStreets where we parse and process addresses, and we have an answer. This is what we call "SLAP" or Single-Line Address Parsing (or Processing). The formal term is Named Entity Recognition (NER).

I'm not an expert on Java libraries, but I do know that any in-house implementations will not live up to expectations. Here's some common reasons that people who I've helped have previously had difficulty:

  • Google / Yahoo! / Bing Maps web services do not allow automated queries and do not verify accuracy of the parsed address.

  • In-house code can make also only make a best guess without any knowledge of existent addresses (a database) or other sorts of official sources. I know you want a library that can do this in-house, but you can at best make a guess...

  • By the way, regular expressions are not the answer. The best regex I've seen to parse addresses was dynamically generated over hundreds of lines of code and several classes. It was a mess, and was only correct for types of addresses you'd expect, not all the valid (US) formats there actually are.

This is an incredibly complex task... unless you have the right tools. One of our services is called LiveAddress API, and it's similar to Google Maps in that it parses addresses and geocodes them, but goes a step further by being CASS-Certified and returning only valid addresses, almost no matter the input format.

I encourage you to do some research of your own, but this is probably the most effective and reliable method.

like image 56
Matt Avatar answered Nov 15 '22 19:11

Matt