Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recognize partial/complete address with NLP framework

I was wondering the amount of work on NLP framework to get partial (without city) or complete postal address extraction with NLP frameworks from unstructured text? Are NLP frameworks efficient to do this? Also, how difficult is it to "train" Named Entity Recognition modules to match new locations ?

like image 381
Steeve Avatar asked Nov 16 '14 08:11

Steeve


1 Answers

As long as most addresses are correctly formatted and regular, i.e. contain contact name, street number, street name, separated by commas, you may find rule-based frameworks.

Using unstructured or partially structured text will require more preprocessing and statistics e.g. morpho-syntax and CRF. Stanford tools are the most popular for this purpose. It may also be an interresting direction to search for corpus containing intermediary annotations: not only "LOC", but also "NUMBER", "STREETNAME", "CITY", etc. so as to be able to extract location even if they are not complete. For this kind of annotation, you may have a look at tree-structured approaches.

So the amount of work mostly depends on how much regular are expressions you are looking for.

like image 112
eldams Avatar answered Nov 13 '22 09:11

eldams