is there a way with spaCy's NER to calculate metrics per entity type?

Tags:

is there a way in the NER model in spaCy to extract the metrics (precision, recall, f1 score) per entity type?

Something that will look like this:

         precision    recall  f1-score   support

  B-LOC      0.810     0.784     0.797      1084
  I-LOC      0.690     0.637     0.662       325
 B-MISC      0.731     0.569     0.640       339
 I-MISC      0.699     0.589     0.639       557
  B-ORG      0.807     0.832     0.820      1400
  I-ORG      0.852     0.786     0.818      1104
  B-PER      0.850     0.884     0.867       735
  I-PER      0.893     0.943     0.917       634

avg / total 0.809 0.787 0.796 6178

taken from: http://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/

Thank you!

235

asked Oct 17 '18 13:10

2 Answers

From spacy v3,

#Test the model

import spacy
from spacy.training.example import Example

nlp = spacy.load("./model_saved")
examples = []
data = [("Taj mahal is in Agra.", {"entities": [(0, 9, 'name'),
(16, 20, 'place')]})]
for text, annots in data:
    doc = nlp.make_doc(text)
    examples.append(Example.from_dict(doc, annots))
print(nlp.evaluate(examples)) # This will provide overall and per entity metrics

178

answered Oct 18 '22 13:10

hodophile

Nice question.

First, we should clarify that spaCy uses the BILUO annotation scheme instead of the BIO annotation scheme you are referring to. From the spacy documentation the letters denote the following:

B: The first token of a multi-token entity.
I: An inner token of a multi-token entity.
L: The final token of a multi-token entity.
U: A single-token entity.
O: A non-entity token.

Then, some definitions:

$definitions$

Spacy has a built-in class to evaluate NER. It's called scorer. Scorer uses exact matching to evaluate NER. The precision score is returned as ents_p, the recall as ents_r and the F1 score as ents_f.

The only problem with that is that it returns the score for all the tags together in the document. However, we can call the function only with the TAG we want and get the desired result.

All together, the code should look like this:

import spacy
from spacy.gold import GoldParse
from spacy.scorer import Scorer

def evaluate(nlp, examples, ent='PERSON'):
    scorer = Scorer()
    for input_, annot in examples:
        text_entities = []
        for entity in annot.get('entities'):
            if ent in entity:
                text_entities.append(entity)
        doc_gold_text = nlp.make_doc(input_)
        gold = GoldParse(doc_gold_text, entities=text_entities)
        pred_value = nlp(input_)
        scorer.score(pred_value, gold)
    return scorer.scores


examples = [
    ("Trump says he's answered Mueller's Russia inquiry questions \u2013 live",{"entities":[[0,5,"PERSON"],[25,32,"PERSON"],[35,41,"GPE"]]}),
    ("Alexander Zverev reaches ATP Finals semis then reminds Lendl who is boss",{"entities":[[0,16,"PERSON"],[55,60,"PERSON"]]}),
    ("Britain's worst landlord to take nine years to pay off string of fines",{"entities":[[0,7,"GPE"]]}),
    ("Tom Watson: people's vote more likely given weakness of May's position",{"entities":[[0,10,"PERSON"],[56,59,"PERSON"]]}),
]

nlp = spacy.load('en_core_web_sm')
results = evaluate(nlp, examples)
print(results)

Call the evaluate function with the proper ent parameter to get the results for each tag.

Hope it helps :)

answered Oct 18 '22 13:10

gdaras

Related questions
                            
                                Adding node labels to bokeh network plots
                            
                                How to use numpy.argsort() as indices in more than 2 dimensions?
                            
                                Python type-hinting, indexable object
                            
                                The relationship between thread and process in multi-process program
                            
                                How can I use Regex to find a string of characters in alphabetical order using Python?
                            
                                Avoiding pylint complaints when importing Python packages from submodules
                            
                                How do I split a string into several columns in a dataframe with pandas Python?
                            
                                How to profile CPU usage of a Python script?
                            
                                Resnet network doesn't work as expected
                            
                                How do I add a Title to a Seaborn Clustermap?
                            
                                extract human vocals from song
                            
                                How to reduce the size of packaged python zip files for AWS Lambda
                            
                                Generate unique binary permutations in python
                            
                                sort_values() got an unexpected keyword argument 'by'
                            
                                Why does the python pathlib Path('').exists() return True?
                            
                                Spark Dataframe - Python - count substring in string
                            
                                Tkinter grid fill empty space
                            
                                ssl/asyncio: traceback even when error is handled
                            
                                Python - matplotlib - differences between subplot() and subplots()
                            
                                feature_names mismach in xgboost despite having same columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

is there a way with spaCy's NER to calculate metrics per entity type?

Tags:

python

entity

metrics

named-entity-recognition

spacy

ln pi

People also ask

2 Answers

hodophile

gdaras

Recent Activity

Donate For Us