Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In spacy, how to use your own word2vec model created in gensim?

I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how.

gensimmodel
Out[252]:
<gensim.models.word2vec.Word2Vec at 0x110b24b70>

import spacy
spacy.load(gensimmodel)

OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
like image 901
Subigya Upadhyay Avatar asked May 22 '18 11:05

Subigya Upadhyay


People also ask

What is the difference between spacy and Gensim?

Spacy is a natural language processing library for Python designed to have fast performance, and with word embedding models built in. Gensim is a topic modelling library for Python that provides modules for training Word2Vec and other word embedding algorithms, and allows using pre-trained models. This tutorial works with Python3.

How to create word2vec model with the Gensim library?

We will use this list to create our Word2Vec model with the Gensim library. With Gensim, it is extremely straightforward to create Word2Vec model. The word list is passed to the Word2Vec class of the gensim.models package. We need to specify the value for the min_count parameter.

How to import custom word vectors from Gensim?

As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using: then you can load you model, nlp = spacy.load ('your_model') and use it!

What is Gensim used for in Python?

Gensim is an open-source python library for natural language processing. Working with Word2Vec in Gensim is the easiest option for beginners due to its high-level API for training your own CBOW and SKip-Gram model or running a pre-trained word2vec model.


2 Answers

Train and save your model in plain-text format:

from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec

path = get_tmpfile("./data/word2vec.model")

model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("./data/word2vec.txt")

Gzip the text file:

gzip word2vec.txt

Which produces a word2vec.txt.gz file.

Run the following command:

python -m spacy init-model en ./data/spacy.word2vec.model --vectors-loc word2vec.txt.gz

Load the vectors using:

nlp = spacy.load('./data/spacy.word2vec.model/')
like image 53
hbot Avatar answered Nov 03 '22 00:11

hbot


As explained here, you can import custom word vectors that trained using Gensim, Fast Text, or Tomas Mikolov's original word2vec implementation, by creating a model using:

wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init-model en your_model --vectors-loc cc.la.300.vec.gz

then you can load you model, nlp = spacy.load('your_model') and use it!

Also see the similar question that answered here.

like image 28
Ali Zarezade Avatar answered Nov 03 '22 00:11

Ali Zarezade