Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How combine word embedded vectors to one vector?

I know the meaning and methods of word embedding(skip-gram, CBOW) completely. And I know, that Google has a word2vector API that by getting the word can produce the vector. but my problem is this: we have a clause that includes the subject, object, verb... that each word is previously embedded by the Google API, now "How we can combine these vectors together to create a vector that is equal to the clause?" Example: Clause: V= "dog bites man" after word embedding by the Google, we have V1, V2, V3 that each of them maps to the dog, bites, man. and we know that: V = V1+ V2 +V3 How can we provide V? I will appreciate if you explain it by taking an example of real vectors.

like image 505
Amir Avatar asked Jun 27 '17 17:06

Amir


People also ask

How do you vectorize in word?

Word Embeddings or Word vectorization is a methodology in NLP to map words or phrases from vocabulary to a corresponding vector of real numbers which used to find word predictions, word similarities/semantics. The process of converting words into numbers are called Vectorization.

Is word embedding same as Word2Vec?

Word2vec is a technique/model to produce word embedding for better word representation. It is a natural language processing method that captures a large number of precise syntactic and semantic word relationships.

Are word vectors and word embeddings same?

Word Embedding or Word Vector is a numeric vector input that represents a word in a lower-dimensional space. It allows words with similar meaning to have a similar representation. They can also approximate meaning. A word vector with 50 values can represent 50 unique features.

What is the difference between Word2Vec and Doc2Vec?

While Word2Vec computes a feature vector for every word in the corpus, Doc2Vec computes a feature vector for every document in the corpus. Doc2vec model is based on Word2Vec, with only adding another vector (paragraph ID) to the input.


1 Answers

So, In this paper : https://arxiv.org/pdf/2004.07464.pdf They have combined image embedding and text embedding by concatenating them.

X = TE + IE 

Here X is fusion embedding with TE and IE as text and image embedding respectively. If your TE and IE have dimension of suppose 2048 each, your X will be of length 2*2024. Then maybe you can use this if possible or if you want to reduce the dimension you can use t-SNE/PCA or https://arxiv.org/abs/1708.03629 (Implemented here : https://github.com/vyraun/Half-Size)

like image 99
Prakhar Gurawa Avatar answered Oct 03 '22 08:10

Prakhar Gurawa