I installed spacy on my system and I want to parse/extract person name, organization for english. But I saw here, there is 4 model for english. And there is model versioning. I didn't get which model is large and which I have to choose for development?
All that while en_core_web_lg is 79 times larger, hence loads a lot more slowly. What I recommend is using the en_core_web_sm while developing and then switching to a larger model in production. You can easily switch just by changing the model you load.
For example, en_core_web_sm is a small English pipeline trained on written web text (blogs, news, comments), that includes vocabulary, syntax and entities.
The model (en_core_web_lg) is the largest English model of spaCy with size 788 MB. There are smaller models in English and some other models for other languages (English, German, French, Spanish, Portuguese, Italian, Dutch, Greek). Step-3: Import Library and Load the Model.
EDIT Feb 2021: spaCy version 3 now uses the Transformer architecture as its deep learning model.
sm
/md
/lg
refer to the sizes of the models (small, medium, large respectively).
As it says on the models page you linked to,
Model differences are mostly statistical. In general, we do expect larger models to be "better" and more accurate overall. Ultimately, it depends on your use case and requirements. We recommend starting with the default models (marked with a star below).
FWIW, the sm
model is the default (as alluded to above)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With