Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spaCy needs a file that is not there: strings.json

I am running pytextrank were in its second stage, I get this error from spaCy:

File "C:\Anaconda3\lib\pathlib.py", line 371, in wrapped return strfunc(str(pathobj), *args)

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Anaconda3\\lib\\site-packages\\spacy\\data\\en\\vocab\\strings.json'

I looked for strings.json but there is no such thing.

The interesting thing is that similar error with pathlib.py existed when I installed spaCy with the following error code:

OSError: Symbolic link privilege not held

Do you guys have any idea ? Thanks

like image 610
Peter Avatar asked Mar 26 '17 02:03

Peter


People also ask

How do I load a .spacy file?

To load a pipeline from a data directory, you can use spacy. load() with the local path. This will look for a config. cfg in the directory and use the lang and pipeline settings to initialize a Language class with a processing pipeline and load in the model data.

What is .spacy file?

Typically, the extension for these binary files is . spacy , and they are used as input format for specifying a training corpus and for spaCy's CLI train command. The built-in convert command helps you convert spaCy's previous JSON format to the new binary format.

What is spacy blank?

blank function. Create a blank pipeline of a given language class. This function is the twin of spacy. load() .


2 Answers

Finallly, I can answer question in stackoverflow. I occurred same problem but solved it eventually. Here is my suggestion:

1. Download spaCy model from python -m spacy or github

both way are very convenient.

1). from python spacy:

python3 -m spacy download en

assume you are using python3+, the can be done automatically and generate new packages of model, which you can import via import en or using spacy.load('en')

2). from github

transfer link, selet the newest version and download it.

2. (if you not using python -m way then you want manually link the model) Link your downloaded model

this is the most important part, you must unzip your downloaded tar or gzip file, and get a folder, however this is still not the link path you want.

.
├── en_core_web_md-1.2.1
│   ├── deps
│   │   ├── config.json
│   │   └── model
│   ├── meta.json
│   ├── ner
│   │   ├── config.json
│   │   └── model
│   ├── pos
│   │   ├── config.json
│   │   └── model
│   └── vocab
│       ├── gazetteer.json
│       ├── lexemes.bin
│       ├── oov_prob
│       ├── serializer.json
│       ├── strings.json
│       └── vec.bin

you must link the folder with the structure. which spacy will link the folder via your link-shortcut name.

here is the link script you need:

base_path=`pwd`
sudo python3 -m spacy link ${base_path}/en_core_web_md-1.2.1 en_core_web --force

you can create a .sh file just alongside that folder and run it.

that's it!

like image 73
Nicholas Jela Avatar answered Oct 21 '22 17:10

Nicholas Jela


The Symbolic link privilege not held error usually occurs when you've installed spaCy and the models into a system directory, but your user does not have the required permissions to create symbolic links. To solve this, either run download or link again as administrator or, if that's not possible, use a virtualenv to install everything into a user directory instead (for more info on this, see the troubleshooting docs).

As of v1.7.0, spaCy creates symlinks aka. shortcut links in the spacy/data directory. This makes it easier to store your models wherever you want, install them as Python packages and load them using custom names, e.g. spacy.load('my_model').

What likely happened in your case is that spaCy failed to set up this link because of the permissions error, and now can't find and load the model – including vocab/strings.json. (The way spaCy failed here is unideal, though – this has since been fixed in v1.7.3.)

Since the model is already installed, all you'd have to do is create a new symlink for it (either as admin, or in a virtualenv):

python -m spacy link en_core_web_sm en

(If you've downloaded a different model, simply replace en_core_web_sm with the name of that model. en is the shortcut to use and can be any name you want.)

Edit: In case you only want to use the tokenizer and don't care about the models, or want to use one of the supported languages that don't yet come with a statistical model, you can also just import the Language class in v1.7.3:

from spacy.fr import French
nlp = French()
like image 45
Ines Montani Avatar answered Oct 21 '22 17:10

Ines Montani