I am trying to train NER with my own data using Spacy. My question is how to integrate my trained NER into the original model ? so that it could be convenient to be continuously trained and used for my application. I did not find any sample.
I found some similar examples below to train NER, but it seems all of these don't save the trained model and integrate it back into Spacy. Some are hold in memory, some are save the NER model into additional folder... So how to do it in appropriate way to meet my demand ? Thank you !!!
I am using spacy 1.7.3
https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py https://github.com/explosion/spacy-dev-resources/blob/master/spacy-annotator/displacy/parse.py
First , load the pre-existing spacy model you want to use and get the ner pipeline through get_pipe() method. Next, store the name of new category / entity type in a string variable LABEL . Now, how will the model know which entities to be classified under the new label ? You will have to train the model with examples.
Text Processing using spaCy | NLP Library Named Entity Recognition NER works by locating and identifying the named entities present in unstructured text into the standard categories such as person names, locations, organizations, time expressions, quantities, monetary values, percentage, codes etc.
To provide training examples to the entity recogniser, you'll first need to create an instance of the GoldParse class. You can specify your annotations in a stand-off format or as token tags.
import spacy
import random
from spacy.gold import GoldParse
from spacy.language import EntityRecognizer
train_data = [
('Who is Chaka Khan?', [(7, 17, 'PERSON')]),
('I like London and Berlin.', [(7, 13, 'LOC'), (18, 24, 'LOC')])
]
nlp = spacy.load('en', entity=False, parser=False)
ner = EntityRecognizer(nlp.vocab, entity_types=['PERSON', 'LOC'])
for itn in range(5):
random.shuffle(train_data)
for raw_text, entity_offsets in train_data:
doc = nlp.make_doc(raw_text)
gold = GoldParse(doc, entities=entity_offsets)
nlp.tagger(doc)
ner.update(doc, gold)
ner.model.end_training()
to simplify this you can try this code
doc = Doc(nlp.vocab, [u'rats', u'make', u'good', u'pets'])
gold = GoldParse(doc, [u'U-ANIMAL', u'O', u'O', u'O'])
ner = EntityRecognizer(nlp.vocab, entity_types=['ANIMAL'])
ner.update(doc, gold)
https://spacy.io/docs/usage/training-ner
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With