I am using Spacy model for NER on my datasets. It is showing poor tagging on B-Address and I-ADDRESS. The reason is because I have different type of address in my document. Some start with number, some start with name of building, some start with po box. Any idea how I can increase my accuracy on address tag?
I can think of two quick approaches that you can try with Spacy before moving on to something different:
Re-train spacy NER with your custom examples: If you have, for instance, a few hundred examples with real addresses, you can manually TAG it and then re-train the spacy NER to overfit your particular address. You can train a new NER from scratch or fine-tune an existing one. I recommend you start fine-tuning the "en_core_web_lg" NER. You can follow the official documentation on how to do so. Also, maybe this answer to a different question can be of some help.
A fixed Rules approach: Spacy has a component called EntityRuler,which allows you a rules-based matching over text. With this component and adding patterns using a Regex-like syntax, your model pipeline can recognize, for example, your particular ADDRESS. You can read more about this here.
I hope this helps you!
EDIT
@polm23 is right, EntityRuler isn't just Regex. I edited the answer and added more info about this component from official docs:
The entity ruler lets you add spans to the Doc.ents using token-based rules or exact phrase matches. It can be combined with the statistical EntityRecognizer to boost accuracy, or used on its own to implement a purely rule-based entity recognition system. For usage examples, see the docs on rule-based entity recognition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With