Evaluation in a Spacy NER model

Tags:

I am trying to evaluate a trained NER Model created using spacy lib. Normally for these kind of problems you can use f1 score (a ratio between precision and recall). I could not find in the documentation an accuracy function for a trained NER model.

I am not sure if it's correct but I am trying to do it with the following way(example) and using f1_score from sklearn:

from sklearn.metrics import f1_score import spacy from spacy.gold import GoldParse   nlp = spacy.load("en") #load NER model test_text = "my name is John" # text to test accuracy doc_to_test = nlp(test_text) # transform the text to spacy doc format  # we create a golden doc where we know the tagged entity for the text to be tested doc_gold_text= nlp.make_doc(test_text) entity_offsets_of_gold_text = [(11, 15,"PERSON")] gold = GoldParse(doc_gold_text, entities=entity_offsets_of_gold_text)  # bring the data in a format acceptable for sklearn f1 function y_true = ["PERSON" if "PERSON" in x else 'O' for x in gold.ner] y_predicted = [x.ent_type_ if x.ent_type_ !='' else 'O' for x in doc_to_test] f1_score(y_true, y_predicted, average='macro')`[1] > 1.0

Any thoughts are or insights are useful.

214

asked Jun 29 '17 14:06

Mpizos Dimitris

2 Answers

You can find different metrics including F-score, recall and precision in spaCy/scorer.py.

This example shows how you can use it:

import spacy from spacy.gold import GoldParse from spacy.scorer import Scorer  def evaluate(ner_model, examples):     scorer = Scorer()     for input_, annot in examples:         doc_gold_text = ner_model.make_doc(input_)         gold = GoldParse(doc_gold_text, entities=annot)         pred_value = ner_model(input_)         scorer.score(pred_value, gold)     return scorer.scores  # example run  examples = [     ('Who is Shaka Khan?',      [(7, 17, 'PERSON')]),     ('I like London and Berlin.',      [(7, 13, 'LOC'), (18, 24, 'LOC')]) ]  ner_model = spacy.load(ner_model_path) # for spaCy's pretrained use 'en_core_web_sm' results = evaluate(ner_model, examples)

The scorer.scores returns multiple scores. When running the example, the result looks like this: (Note the low scores occuring because the examples classify London and Berlin as 'LOC' while the model classifies them as 'GPE'. You can figure this out by looking at the ents_per_type.)

{'uas': 0.0, 'las': 0.0, 'las_per_type': {'attr': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'root': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'compound': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'nsubj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'dobj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'cc': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'conj': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'ents_p': 33.33333333333333, 'ents_r': 33.33333333333333, 'ents_f': 33.33333333333333, 'ents_per_type': {'PERSON': {'p': 100.0, 'r': 100.0, 'f': 100.0}, 'LOC': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'GPE': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'tags_acc': 0.0, 'token_acc': 100.0, 'textcat_score': 0.0, 'textcats_per_cat': {}}

The example is taken from a spaCy example on github (link does not work anymore). It was last tested with spacy 2.2.4.

193

answered Sep 18 '22 14:09

Mpizos Dimitris

Note that in spaCy v3 there is an evaluate command you can use easily from the command line instead of writing custom code to handle things.

answered Sep 17 '22 14:09

polm23

Related questions
                            
                                Create a self signed X509 certificate in Python
                            
                                Can't write and save a video file using OpenCV and Python
                            
                                When global_variables_initializer() is actually required
                            
                                Why is '\x' invalid in Python?
                            
                                Best way to integrate Erlang and python
                            
                                Is Celery as efficient on a local system as python multiprocessing is?
                            
                                Python/Django: synonym for field "type" in database model (reserved built-in symbol)
                            
                                Old-style and new-style classes in Python 2.7 [duplicate]
                            
                                Accessing remote MySQL database with peewee
                            
                                Is it possible to skip setUp() for a specific test in python's unittest?
                            
                                In pandas, can I deeply copy a DataFrame including its index and column?
                            
                                Notification of key expiration in redis python
                            
                                Closing Matplotlib figures [duplicate]
                            
                                Python check if a directory exists, then create it if necessary and save graph to new directory? [duplicate]
                            
                                Conditionally Filtering in SQLAlchemy
                            
                                python: after installing anaconda, how to import pandas
                            
                                Get request python as a string
                            
                                Downsampling a 2d numpy array in python
                            
                                Installing python 3.5 via apt-get [closed]
                            
                                Efficient way to partially read large numpy file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Evaluation in a Spacy NER model

Tags:

python

spacy

Mpizos Dimitris

People also ask

2 Answers

Mpizos Dimitris

polm23

Recent Activity

Donate For Us