Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Methods for extracting locations from text?

What are the recommended methods for extracting locations from free text?

What I can think of is to use regex rules like "words ... in location". But are there better approaches than this?

Also I can think of having a lookup hash table table with names for countries and cities and then compare every extracted token from the text to that of the hash table.

Does anybody know of better approaches?

Edit: I'm trying to extract locations from tweets text. So the issue of high number of tweets might also affect my choice for a method.

like image 591
Jack Twain Avatar asked Jul 20 '13 12:07

Jack Twain


People also ask

What is a text extractor?

Text extractors use AI to identify and extract relevant or notable pieces of information from within documents or online resources. Most simply, text extraction pulls important words from written texts and images. Try out this free keyword extraction tool to see how it works.


1 Answers

All rule-based approaches will fail (if your text is really "free"). That includes regex, context-free grammars, any kind of lookup... Believe me, I've been there before :-)

This problem is called Named Entity Recognition. Location is one of the 3 most studied classes (with Person and Organization). Stanford NLP has an open source Java implementation that is extremely powerful: http://nlp.stanford.edu/software/CRF-NER.shtml

You can easily find implementations in other programming languages.

like image 139
Blacksad Avatar answered Nov 08 '22 14:11

Blacksad