I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer. So my question is, how do I get the embedding weights loaded by gensim into the PyTorch embedding layer. Thanks in Advance!

I just wanted to report my findings about loading a gensim embedding with PyTorch. <hr> <ul> <li><h3>Solution for PyTorch <code>0.4.0</code> and newer:</h3></li> </ul> From <code>v0.4.0</code> there is a new function <code>from_pretrained()</code> which makes loading an embedding very comfortable. Here is an example from the documentation. <pre class="prettyprint"><code>import torch import torch.nn as nn # FloatTensor containing pretrained weights weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) embedding = nn.Embedding.from_pretrained(weight) # Get embeddings for index 1 input = torch.LongTensor([1]) embedding(input) </code></pre> The weights from gensim can easily be obtained by: <pre class="prettyprint"><code>import gensim model = gensim.models.KeyedVectors.load_word2vec_format('path/to/file') weights = torch.FloatTensor(model.vectors) # formerly syn0, which is soon deprecated </code></pre> As noted by @Guglie: in newer gensim versions the weights can be obtained by <code>model.wv</code>: <pre class="prettyprint"><code>weights = model.wv </code></pre> <hr> <ul> <li><h3>Solution for PyTorch version <code>0.3.1</code> and older:</h3></li> </ul> I'm using version <code>0.3.1</code> and <code>from_pretrained()</code> isn't available in this version. Therefore I created my own <code>from_pretrained</code> so I can also use it with <code>0.3.1</code>. Code for <code>from_pretrained</code> for PyTorch versions <code>0.3.1</code> or lower: <pre class="prettyprint"><code>def from_pretrained(embeddings, freeze=True): assert embeddings.dim() == 2, \ 'Embeddings parameter is expected to be 2-dimensional' rows, cols = embeddings.shape embedding = torch.nn.Embedding(num_embeddings=rows, embedding_dim=cols) embedding.weight = torch.nn.Parameter(embeddings) embedding.weight.requires_grad = not freeze return embedding </code></pre> The embedding can be loaded then just like this: <pre class="prettyprint"><code>embedding = from_pretrained(weights) </code></pre> I hope this is helpful for someone.

I think it is easy. Just copy the embedding weight from gensim to the corresponding weight in PyTorch embedding layer. You need to make sure two things are correct: first is that the weight shape has to be correct, second is that the weight has to be converted to PyTorch FloatTensor type.

PyTorch / Gensim - How to load pre-trained word embeddings

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer.

So my question is, how do I get the embedding weights loaded by gensim into the PyTorch embedding layer.

Thanks in Advance!

Is Gensim used for word embedding?

Gensim Python Library Most notably for this tutorial, it supports an implementation of the Word2Vec word embedding for learning new word vectors from text. It also provides tools for loading pre-trained word embeddings in a few formats and for making use and querying a loaded embedding.

Is using pre-trained Embeddings better than using custom trained Embeddings?

This can mean that for solving semantic NLP tasks, when the training set at hand is sufficiently large (as was the case in the Sentiment Analysis experiments), it is better to use pre-trained word embeddings.

What is Pretrained word embeddings?

Pretrained Word Embeddings are the embeddings learned in one task that are used for solving another similar task. These embeddings are trained on large datasets, saved, and then used for solving other tasks. That's why pretrained word embeddings are a form of Transfer Learning.

I just wanted to report my findings about loading a gensim embedding with PyTorch.

Solution for PyTorch 0.4.0 and newer:

From v0.4.0 there is a new function from_pretrained() which makes loading an embedding very comfortable. Here is an example from the documentation.

import torch import torch.nn as nn  # FloatTensor containing pretrained weights weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) embedding = nn.Embedding.from_pretrained(weight) # Get embeddings for index 1 input = torch.LongTensor([1]) embedding(input)

The weights from gensim can easily be obtained by:

import gensim model = gensim.models.KeyedVectors.load_word2vec_format('path/to/file') weights = torch.FloatTensor(model.vectors) # formerly syn0, which is soon deprecated

As noted by @Guglie: in newer gensim versions the weights can be obtained by model.wv:

weights = model.wv

Solution for PyTorch version 0.3.1 and older:

I'm using version 0.3.1 and from_pretrained() isn't available in this version.

Therefore I created my own from_pretrained so I can also use it with 0.3.1.

Code for from_pretrained for PyTorch versions 0.3.1 or lower:

def from_pretrained(embeddings, freeze=True):     assert embeddings.dim() == 2, \          'Embeddings parameter is expected to be 2-dimensional'     rows, cols = embeddings.shape     embedding = torch.nn.Embedding(num_embeddings=rows, embedding_dim=cols)     embedding.weight = torch.nn.Parameter(embeddings)     embedding.weight.requires_grad = not freeze     return embedding

The embedding can be loaded then just like this:

embedding = from_pretrained(weights)

I hope this is helpful for someone.

I think it is easy. Just copy the embedding weight from gensim to the corresponding weight in PyTorch embedding layer.

You need to make sure two things are correct: first is that the weight shape has to be correct, second is that the weight has to be converted to PyTorch FloatTensor type.

PyTorch / Gensim - How to load pre-trained word embeddings

Tags:

python

neural-network

pytorch

embedding

gensim

MBT

People also ask

2 Answers

Solution for PyTorch `0.4.0` and newer:

Solution for PyTorch version `0.3.1` and older:

MBT

jdhao

Recent Activity

Donate For Us

PyTorch / Gensim - How to load pre-trained word embeddings

Tags:

python

neural-network

pytorch

embedding

gensim

MBT

People also ask

2 Answers

Solution for PyTorch 0.4.0 and newer:

Solution for PyTorch version 0.3.1 and older:

MBT

jdhao

Related questions

Recent Activity

Donate For Us

Solution for PyTorch `0.4.0` and newer:

Solution for PyTorch version `0.3.1` and older: