what is difference between spacy.load('en_core_web_sm')
and spacy.load('en')
? This link explains different model sizes. But i am still not clear how spacy.load('en_core_web_sm')
and spacy.load('en')
differ
spacy.load('en')
runs fine for me. But the spacy.load('en_core_web_sm')
throws error
i have installed spacy
as below. when i go to jupyter notebook and run command nlp = spacy.load('en_core_web_sm')
I get the below error
--------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-4-b472bef03043> in <module>() 1 # Import spaCy and load the language library 2 import spacy ----> 3 nlp = spacy.load('en_core_web_sm') 4 5 # Create a Doc object C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\spacy\__init__.py in load(name, **overrides) 13 if depr_path not in (True, False, None): 14 deprecation_warning(Warnings.W001.format(path=depr_path)) ---> 15 return util.load_model(name, **overrides) 16 17 C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\spacy\util.py in load_model(name, **overrides) 117 elif hasattr(name, 'exists'): # Path or Path-like to model data 118 return load_model_from_path(name, **overrides) --> 119 raise IOError(Errors.E050.format(name=name)) 120 121 OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
how I installed Spacy ---
(C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>conda install -c conda-forge spacy Fetching package metadata ............. Solving package specifications: . Package plan for installation in environment C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder: The following NEW packages will be INSTALLED: blas: 1.0-mkl cymem: 1.31.2-py35h6538335_0 conda-forge dill: 0.2.8.2-py35_0 conda-forge msgpack-numpy: 0.4.4.2-py_0 conda-forge murmurhash: 0.28.0-py35h6538335_1000 conda-forge plac: 0.9.6-py_1 conda-forge preshed: 1.0.0-py35h6538335_0 conda-forge pyreadline: 2.1-py35_1000 conda-forge regex: 2017.11.09-py35_0 conda-forge spacy: 2.0.12-py35h830ac7b_0 conda-forge termcolor: 1.1.0-py_2 conda-forge thinc: 6.10.3-py35h830ac7b_2 conda-forge tqdm: 4.29.1-py_0 conda-forge ujson: 1.35-py35hfa6e2cd_1001 conda-forge The following packages will be UPDATED: msgpack-python: 0.4.8-py35_0 --> 0.5.6-py35he980bc4_3 conda-forge The following packages will be DOWNGRADED: freetype: 2.7-vc14_2 conda-forge --> 2.5.5-vc14_2 Proceed ([y]/n)? y blas-1.0-mkl.t 100% |###############################| Time: 0:00:00 0.00 B/s cymem-1.31.2-p 100% |###############################| Time: 0:00:00 1.65 MB/s msgpack-python 100% |###############################| Time: 0:00:00 5.37 MB/s murmurhash-0.2 100% |###############################| Time: 0:00:00 1.49 MB/s plac-0.9.6-py_ 100% |###############################| Time: 0:00:00 0.00 B/s pyreadline-2.1 100% |###############################| Time: 0:00:00 4.62 MB/s regex-2017.11. 100% |###############################| Time: 0:00:00 3.31 MB/s termcolor-1.1. 100% |###############################| Time: 0:00:00 187.81 kB/s tqdm-4.29.1-py 100% |###############################| Time: 0:00:00 2.51 MB/s ujson-1.35-py3 100% |###############################| Time: 0:00:00 1.66 MB/s dill-0.2.8.2-p 100% |###############################| Time: 0:00:00 4.34 MB/s msgpack-numpy- 100% |###############################| Time: 0:00:00 0.00 B/s preshed-1.0.0- 100% |###############################| Time: 0:00:00 0.00 B/s thinc-6.10.3-p 100% |###############################| Time: 0:00:00 5.49 MB/s spacy-2.0.12-p 100% |###############################| Time: 0:00:10 7.42 MB/s (C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>python -V Python 3.5.3 :: Anaconda custom (64-bit) (C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>python -m spacy download en Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0 Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB) 100% |################################| 37.4MB ... Installing collected packages: en-core-web-sm Running setup.py install for en-core-web-sm ... done Successfully installed en-core-web-sm-2.0.0 Linking successful C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\en_core_web_sm --> C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\spacy\data\en You can now load the model via spacy.load('en') (C:\Users\nikhizzz\AppData\Local\conda\conda\envs\tensorflowspyder) C:\Users\nikhizzz>
It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory Error You need to just download this packages using this command: python -m spacy download en_core_web_lg and python -m spacy download en_core_web_sm And then Run this command: python -m spacy download en and my error solved.
For example, en_core_web_sm is a small English pipeline trained on written web text (blogs, news, comments), that includes vocabulary, syntax and entities.
Initially I downloaded two en packages using following statements in anaconda prompt.
python -m spacy download en_core_web_lg python -m spacy download en_core_web_sm
But, I kept on getting linkage error and finally running below command helped me to establish link and solved error.
python -m spacy download en
Also make sure you to restart your runtime if working with Jupyter. -PS : If you get linkage error try giving admin previlages.
The answer to your misunderstanding is a Unix concept, softlinks which we could say that in Windows are similar to shortcuts. Let's explain this.
When you spacy download en
, spaCy tries to find the best small model that matches your spaCy distribution. The small model that I am talking about defaults to en_core_web_sm
which can be found in different variations which correspond to the different spaCy versions (for example spacy
, spacy-nightly
have en_core_web_sm
of different sizes).
When spaCy finds the best model for you, it downloads it and then links the name en
to the package it downloaded, e.g. en_core_web_sm
. That basically means that whenever you refer to en
you will be referring to en_core_web_sm
. In other words, en
after linking is not a "real" package, is just a name for en_core_web_sm
.
However, it doesn't work the other way. You can't refer directly to en_core_web_sm
because your system doesn't know you have it installed. When you did spacy download en
you basically did a pip install. So pip knows that you have a package named en
installed for your python distribution, but knows nothing about the package en_core_web_sm
. This package is just replacing package en
when you import it, which means that package en
is just a softlink to en_core_web_sm
.
Of course, you can directly download en_core_web_sm
, using the command: python -m spacy download en_core_web_sm
, or you can even link the name en
to other models as well. For example, you could do python -m spacy download en_core_web_lg
and then python -m spacy link en_core_web_lg en
. That would make en
a name for en_core_web_lg
, which is a large spaCy model for the English language.
Hope it is clear now :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With