Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Stanford NER for extracting Address from a text document?

I was looking Stanford NER and thinking of using JAVA Apis it to extract postal address from a text document. The document may be any document where there is an postal address section e.g. Utility Bills, electricity bills.

So what I am thinking as the approach is,

  1. Define postal address as a named entity using LOCATION and other primitive named entities.
  2. Define segmentation and other sub process.

I am trying to find a example pipeline for the same (what are the steps in details required), anyone has done this before? Suggestions welcome.

like image 594
yadab Avatar asked Dec 22 '15 04:12

yadab


1 Answers

To be clear: all credit goes to Raj Vardhan (and John Bauer) who had an interaction on the [java-nlp-user] mailing list.

Raj Vardhan wrote about the plan to work on "finding street address in a sentence":

Here is an approach I have thought of:

  1. Find the event-anchor in a sentence
  2. Select outgoing-edges in the SemanticGraph from that event-node with relations such as *"prep-in" *or "prep-at".
  3. IF the dependent value in the relation has POS tag as NNP

a) Find outgoing-edges from dependent value's node with relations such as "nn"

b) Connect all such nodes in increasing order of occurrence in the sentence.

c) PRINT resulting value as Location where the event occurred

This is obviously with certain assumptions such as direct dependency between the event-anchor and location in a sentence.

Not sure whether this could help you, but I wanted to mention it just in case. Again, any credit should go to Raj Vardhan (and John Bauer).

like image 195
Freek de Bruijn Avatar answered Nov 16 '22 10:11

Freek de Bruijn