Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between spacy.lang.en and load('en')?

In my studies on NLP, more specifically the spacy library, I was confused with that, what is the difference between from spacy.lang.en import English() and spacy.load('en') and how it works? Someone can help me explain this and if possible with some example of this difference? Thanks in advance.

like image 426
Theodoro Caliari Avatar asked Oct 15 '22 10:10

Theodoro Caliari


1 Answers

The English language class in spacy.lang.en contains the language-specific code and rules included in the library – for example, special case rules for tokenization, stop words or functions to decide whether a word like "twenty two" resembles a number.

spacy.load("en") loads the installed statistical model with the shortcut name en – in this case, the en_core_web_sm package. So you could also run spacy.load("en_core_web_sm"), which makes things a bit more explicit. Loading a model will initialize the respective language class (in this case, English), set up the processing pipeline and load in the binary weights of the trained model that allow spaCy to make predictions (e.g. whether a word is a noun or what named entities are in the text). So the nlp object you get back after loading a model is an instance of English, but it also has a processing pipeline set up and weights loaded in.

You can find a more detailed overview of how spacy.load works under the hood here. The first chapter of the spaCy online course also explains the language classes and statistical models in more detail.

like image 147
Ines Montani Avatar answered Nov 15 '22 09:11

Ines Montani