I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how.
gensimmodel
Out[252]:
<gensim.models.word2vec.Word2Vec at 0x110b24b70>
import spacy
spacy.load(gensimmodel)
OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Spacy is a natural language processing library for Python designed to have fast performance, and with word embedding models built in. Gensim is a topic modelling library for Python that provides modules for training Word2Vec and other word embedding algorithms, and allows using pre-trained models. This tutorial works with Python3.
We will use this list to create our Word2Vec model with the Gensim library. With Gensim, it is extremely straightforward to create Word2Vec model. The word list is passed to the Word2Vec class of the gensim.models package. We need to specify the value for the min_count parameter.
As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using: then you can load you model, nlp = spacy.load ('your_model') and use it!
Gensim is an open-source python library for natural language processing. Working with Word2Vec in Gensim is the easiest option for beginners due to its high-level API for training your own CBOW and SKip-Gram model or running a pre-trained word2vec model.
Train and save your model in plain-text format:
from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec
path = get_tmpfile("./data/word2vec.model")
model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("./data/word2vec.txt")
Gzip the text file:
gzip word2vec.txt
Which produces a word2vec.txt.gz
file.
Run the following command:
python -m spacy init-model en ./data/spacy.word2vec.model --vectors-loc word2vec.txt.gz
Load the vectors using:
nlp = spacy.load('./data/spacy.word2vec.model/')
As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using:
wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init-model en your_model --vectors-loc cc.la.300.vec.gz
then you can load you model, nlp = spacy.load('your_model')
and use it!
Also see the similar question that answered here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With