Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

word2vec - what is best? add, concatenate or average word vectors?

I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the word embedding (rows of input/hidden matrix) and the context embedding (columns of hidden/output matrix).

As outlined in this post there are at least three common ways to combine these two embedding vectors:

  1. summing the context and word vector for each word
  2. summing & averaging
  3. concatenating the context and word vector

However, I couldn't find proper papers or reports on the best strategy. So my questions are:

  1. Is there a common solution whether to sum, average or concatenate the vectors?
  2. Or does the best way depend entirely on the task in question? If so, what strategy is best for a word-level language model?
  3. Why combine the vectors at all? Why not use the "original" word embeddings for each word, i.e. those contained in the weight matrix between input and hidden neurons.

Related (but unanswered) questions:

  • word2vec: Summing/concatenate inside and outside vector
  • why we use input-hidden weight matrix to be the word vectors instead of hidden-output weight matrix?
like image 377
Lemon Avatar asked Oct 23 '17 12:10

Lemon


People also ask

Why do you need to average the word vectors for a sentence?

It simply means that the author needed a single vector to represent a tweet so that he/she can run a classifier (probably). In other words, averaging of vectors was defined downstream by a tool that accepted a single vector. Can you post name of the paper for context?

Is Word2vec better than bag-of-words?

The main difference is that Word2vec produces one vector per word, whereas BoW produces one number (a wordcount). Word2vec is great for digging into documents and identifying content and subsets of content. Its vectors represent each word's context, the ngrams of which it is a part.

What is the disadvantage of Word2vec?

Perhaps the biggest problem with word2vec is the inability to handle unknown or out-of-vocabulary (OOV) words. If your model hasn't encountered a word before, it will have no idea how to interpret it or how to build a vector for it. You are then forced to use a random vector, which is far from ideal.

What is average word embedding?

AWE is An advanced approach to word embedding, applying a weighting to each word in the sentence to circumvent the weakness of simple averaging. Word embeddings are the preferred method of representing words in natural language processing tasks.


2 Answers

I have found an answer in the Stanford lecture "Deep Learning for Natural Language Processing" (Lecture 2, March 2016). It's available here. In minute 46 Richard Socher states that the common way is to average the two word vectors.

like image 181
Lemon Avatar answered Sep 25 '22 01:09

Lemon


You should read this research work at-least once to get the whole idea of combining word embeddings using different algebraic operators. It was my research.

In this paper you can also see the other methods to combine word vectors.

In short L1-Normalized average word vectors and sum of words are good representations.

like image 36
Nomiluks Avatar answered Sep 24 '22 01:09

Nomiluks