Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get vector for a sentence from the word2vec of tokens in sentence

Tags:

word2vec

I have generated the vectors for a list of tokens from a large document using word2vec. Given a sentence, is it possible to get the vector of the sentence from the vector of the tokens in the sentence.

like image 877
trialcritic Avatar asked Apr 21 '15 00:04

trialcritic


People also ask

What is a sentence vector?

Essentially, every paragraph (or sentence) is mapped to a unique vector, and the combined paragraph and word vectors are used to predict the next word. Through such a training, the paragraph vectors may start storing missing information, thus acting like a memory for the paragraph.

Does word2vec work on phrases?

word2vec will simply embed each phrase within the window of phrases coming before and after each phrase now (just like before with words). So if the phrases before and after your target phrase are not meaningful with respect to that target phrase, your numbers will neither be meaningful.

How do you convert words into vectors?

Converting words to vectors, or word vectorization, is a natural language processing (NLP) process. The process uses language models to map words into vector space. A vector space represents each word by a vector of real numbers. It also allows words with similar meanings have similar representations.


1 Answers

There are differet methods to get the sentence vectors :

  1. Doc2Vec : you can train your dataset using Doc2Vec and then use the sentence vectors.
  2. Average of Word2Vec vectors : You can just take the average of all the word vectors in a sentence. This average vector will represent your sentence vector.
  3. Average of Word2Vec vectors with TF-IDF : this is one of the best approach which I will recommend. Just take the word vectors and multiply it with their TF-IDF scores. Just take the average and it will represent your sentence vector.
like image 77
neel Avatar answered Sep 21 '22 08:09

neel