I have a collection of bills and Invoices, so there is no context in the text (i mean they don't tell a story). I want to extract people names from those bills. I tried OpenNLP but the quality of trained model is not good because i don't have context. so the first question is: can I train model contains only people names without context? and if that possible can you give me good article for how i build that new model (most of the article that i read didn't explain the steps that i should made to build new model).
I have database name with more than 100,000 person name (first name, last name), so if the NER systems don't work in my case (because there is no context), what is the best way to search for those candidates (I mean searching for every first name with all other last names?)
thanks.
Regarding "context", I guess you mean that you don't have entire sentences, i.e. no previous / next tokens, and in this case you face quite a non-standard NER. I am not aware of available software or training data for this particular problem, if you found none you'll have to build your own corpus for training and/or evaluation purposes.
Your database of names will probably greatly help, depending indeed on what proportion of bill names are actually present in the database. You'll also probably have to rely on character-level morphology of names, as patterns (see for instance patterns in [1]). Once you have a training set with features (presence in database, morphology, other information of bill) and solutions (actual names of annotated bills), using standard machine-learning as SVM will be quite straightforward (if you are not familiar with this, just ask).
Some other suggestions:
[1] Ranking algorithms for named-entity extraction: Boosting and the voted perceptron (Michael Collins, 2002)
I'd start with some regular expressions, then possibly augment that with a dictionary-based approach (i.e., big list of names).
No matter what you do, it won't be perfect, so be sure to keep that in mind.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With