Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK Named Entity Recognition with Custom Data

I'm trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. I've been trying to find a way to train my own NER, but I don't seem to be able to find the right resources. I have a couple of questions regarding NLTK-

  1. Can I use my own data to train an Named Entity Recognizer in NLTK?
  2. If I can train using my own data, is the named_entity.py the file to be modified?
  3. Does the input file format have to be in IOB eg. Eric NNP B-PERSON ?
  4. Are there any resources - apart from the nltk cookbook and nlp with python that I can use?

I would really appreciate help in this regard

like image 449
user1502248 Avatar asked Jul 04 '12 18:07

user1502248


People also ask

Which is better NLTK or spaCy?

While NLTK provides access to many algorithms to get something done, spaCy provides the best way to do it. It provides the fastest and most accurate syntactic analysis of any NLP library released to date. It also offers access to larger word vectors that are easier to customize.

Which is best model for named entity recognition?

There are two main models used to achieve this goal: Ontology-based models and Deep Learning-based models. Ontology-based Named Entity Recognition uses a knowledge-based recognition process that relies on lists of datasets, such as a list of company names for the company category, to make inferences.

How do I add a custom entity to spaCy?

EntityRuler() allows you to create your own entities to add to a spaCy pipeline. You start by creating an instance of EntityRuler() and passing it the current pipeline, nlp . You can then call add_patterns() on the instance and pass it a dictionary of the text pattern you'd like to label with an entity.


1 Answers

Are you committed to using NLTK/Python? I ran into the same problems as you, and had much better results using Stanford's named-entity recognizer: http://nlp.stanford.edu/software/CRF-NER.shtml. The process for training the classifier using your own data is very well-documented in the FAQ.

If you really need to use NLTK, I'd hit up the mailing list for some advice from other users: http://groups.google.com/group/nltk-users.

Hope this helps!

like image 92
jjdubs Avatar answered Oct 08 '22 13:10

jjdubs