Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does spacy language model download?

Tags:

python

spacy

I have a simple command:

python -m spacy download en_core_web

And I cannot for the life of me figure out where it downloads. I search for "en_core_web" but can find absolutely nothing, anywhere. And I can't for the life of me figure out what to search to understand the syntax behind this command.

What do you even call this line? A python command line argument? I couldn't find what to search for to specify a download location.

Please help!

like image 315
Josh Flori Avatar asked Oct 30 '19 00:10

Josh Flori


2 Answers

I stumbled across the same question and the model path can be found using the model class variable to a loaded spacy model.

For instance, having completed the model download at the command line as follows:
python -m spacy download en_core_web_sm

then within the python shell:

import spacy
model = spacy. load("en_core_web_sm")
model._path

This will show you where the model has been installed in your system.

If you want to download to a different location, I believe you can write the following at the command line:
python -m spacy.en.download en_core_web_sm --data-path /some/dir

Hope that helps

like image 85
Matt Avatar answered Oct 11 '22 20:10

Matt


I can't seem to find any evidence that spacy pays attention to the $SPACY_DATA_DIR environment variable, nor can I get the above --data-path or model.path (--model.path?) parameters to work when trying to download models to a particular place. For me this was an issue as I was trying to keep the models out of a Docker image so that they could be persisted or be updated easily without rebuilding the image.

I eventually came to the following solution for using pre-trained models:

  1. Run the download code as normal (i.e. python -m spacy.download en_core_web_lg)
  2. In Python: import spacy and then nlp = spacy.load('en_core_web_lg')
  3. Now save this to the place you want it: nlp.to_disk('path/to/dir')

You can now load this from the local file via nlp=spacy.load('path/to/dir'). There's a suggestion in the documentation that you can download the models manually:

You can place the model data directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory. But I can't make sense of what this means in practice (have submitted an 'issue' to spaCy).

Hope this helps anyone else trying to do something similar.

like image 42
JonR Avatar answered Oct 11 '22 21:10

JonR