Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving accuracy of NER on Spacy for a tag that is not following one format [closed]

I am using Spacy model for NER on my datasets. It is showing poor tagging on B-Address and I-ADDRESS. The reason is because I have different type of address in my document. Some start with number, some start with name of building, some start with po box. Any idea how I can increase my accuracy on address tag?

like image 409
Sheri Avatar asked Oct 19 '25 10:10

Sheri


1 Answers

I can think of two quick approaches that you can try with Spacy before moving on to something different:

  1. Re-train spacy NER with your custom examples: If you have, for instance, a few hundred examples with real addresses, you can manually TAG it and then re-train the spacy NER to overfit your particular address. You can train a new NER from scratch or fine-tune an existing one. I recommend you start fine-tuning the "en_core_web_lg" NER. You can follow the official documentation on how to do so. Also, maybe this answer to a different question can be of some help.

  2. A fixed Rules approach: Spacy has a component called EntityRuler,which allows you a rules-based matching over text. With this component and adding patterns using a Regex-like syntax, your model pipeline can recognize, for example, your particular ADDRESS. You can read more about this here.

I hope this helps you!

EDIT

@polm23 is right, EntityRuler isn't just Regex. I edited the answer and added more info about this component from official docs:

The entity ruler lets you add spans to the Doc.ents using token-based rules or exact phrase matches. It can be combined with the statistical EntityRecognizer to boost accuracy, or used on its own to implement a purely rule-based entity recognition system. For usage examples, see the docs on rule-based entity recognition.

like image 83
Emiliano Viotti Avatar answered Oct 21 '25 06:10

Emiliano Viotti



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!