Are there any good opensource Geoparsers? There are several free solutions (services) available (e.g. Yahoo's placemaker, EDINA's Unlock Text) but they do not appear to be opensource.
Ideally the parser should be aimed to mine location information from arbitrary text (as opposed to expect input to be a location, like Google's Geo Coding API or GeoName's search API but such suggestions are welcome as well.
Thanks in advance.
Related question on SO: Identifying geographical locations in text
UPDATE: Apparently Unlock Text is based on "Edinburgh Geoparser" which is open source (GPL) but not currently publicly downladable (source).
Fairly recent evaluation of geoparsers: http://www.scribd.com/doc/41603112/geoparser
This one seems pretty cool, but the implementation assumes the address to be US address: http://openblockproject.org/docs/index.html
List of parsers found so far:
- JGeocoder http://jgeocoder.sourceforge.net/parser.html
- Gisgraphy http://www.gisgraphy.com/
- Geotools http://www.geotools.org/
(geotools does not seem to provide geocoding (http://osgeo-org.1560.n6.nabble.com/Review-or-Suggestion-for-Geocoding-Service-in-US-td4991055.html))
Some other resources:
- http://www.osgeo.org/
- http://lin-ear-th-inking.blogspot.co.uk/2010/03/open-source-geocoders.html
- http://lin-ear-th-inking.blogspot.co.uk/2010/03/more-open-source-geocoders.html
- Reverse geotagging
- Geohack
Libpostal uses machine learning and is informed by tens of millions of real-world addresses from OpenStreetMap. The entire pipeline for training the models is open source.
Perhaps the greatest weakness of current geoparsing technology is the lack of understanding of metonymy on the parsers' side. That is, relying too much on syntactic pattern matching and the dominant sense of the entity in question at the expense of taking important cues from the surrounding words.
The easiest way to parse an address is by applying a Regex. This method really proves itself when you have regular form addresses. For example, if all the address strings are like STREET_NAME XX, YYYYYY CITY_NAME, you can select a regexp that will split the strings to [STREET_NAME, XX, YYYYYY, CITY_NAME].
CLAVIN seems like a possible option.
From the website: "CLAVIN (Cartographic Location And Vicinity INdexer) is an award-winning open source software package for document geotagging and geoparsing that employs context-based geographic entity resolution."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With