Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to train a reverse embedding, like vec2word?

how do you train a neural network to map from a vector representation, to one hot vectors? The example I'm interested in is where the vector representation is the output of a word2vec embedding, and I'd like to map onto the the individual words which were in the language used to train the embedding, so I guess this is vec2word?

In a bit more detail; if I understand correctly, a cluster of points in embedded space represents similar words. Thus if you sample from points in that cluster, and use it as the input to vec2word, the output should be a mapping to similar individual words?

I guess I could do something similar to an encoder-decoder, but does it have to be that complicated/use so many parameters?

There's this TensorFlow tutorial, how to train word2vec, but I can't find any help to do the reverse? I'm happy to do it using any deeplearning library, and it's OK to do it using sampling/probabilistic.

Thanks a lot for your help, Ajay.

like image 977
Ajay T Avatar asked Apr 20 '17 09:04

Ajay T


1 Answers

One easiest thing that you can do is to use the nearest neighbor word. Given a query feature of an unknown word fq, and a reference feature set of known words R={fr}, then you can find out what is the nearest fr* for fq, and use the corresponding fr* word as fq's word.

like image 200
pitfall Avatar answered Nov 15 '22 09:11

pitfall