How to infer new word vectors from a gensim word2vec model?

Question

I want to add new words into a trained gensim word2vec model using a new text dataset. However, I want to preserve the old word embeddings and just add the new words from the dataset into the existing model. This means simple retraining of the old model with the new text dataset isn't an option as it will readjust the vectors of the previous word embeddings that are also in the new text dataset. Can you give any suggestions regarding this task? I would like something like Gensim's doc2vec infer feature where you feed the model some text input and it gives a vector as an output. Thanks.

Sam H. · Accepted Answer

I would do the following (pseudoPython):

for word in new_words:
    # find words that should be nearby
    synonyms = thesaurus.lookup(word)

    # initialize an empty word vector
    new_word_embedding = np.zeros(number_of_dimensions_a_word_vector_is)

    # average the embeddings of synonyms
    for syn in synonyms:
        if w2v.get_embedding(syn):
            a = np.array(new_word_embedding, w2v.get_embedding(syn))
            new_word_embedding = np.mean(a, axis=0)

How to infer new word vectors from a gensim word2vec model?

Tags:

neural-network

gensim

word2vec

Wargream

1 Answers

Sam H.

Recent Activity

Donate For Us

How to infer new word vectors from a gensim word2vec model?

Tags:

neural-network

gensim

word2vec

Wargream

1 Answers

Sam H.

Related questions

Recent Activity

Donate For Us