How to create word vector? I used one hot key to create word vector, but it is very huge and not generalized for similar semantic word. So I have heard about word vector using neural network that finds word similarity and word vector. So I wanted to know how to generate this vector (algorithm) or good material to start creating word vector ?.
Word-vectors or so-called distributed representations have a long history by now, starting perhaps from work of S. Bengio (Bengio, Y., Ducharme, R., & Vincent, P. (2001).A neural probabilistic language model. NIPS.) where he obtained word-vectors as by-product of training neural-net lanuage model.
A lot of researches demonstrated that these vectors do capture semantic relationship between words (see for example http://research.microsoft.com/pubs/206777/338_Paper.pdf). Also this important paper (http://arxiv.org/abs/1103.0398) by Collobert et al, is a good starting point with understanding word vectors, the way they are obtained and used.
Besides word2vec there is a lot of methods to obtain them. Expamples include SENNA embeddings by Collobert et al (http://ronan.collobert.com/senna/), RNN embeddings by T. Mikolov that can be computed using RNNToolkit (http://www.fit.vutbr.cz/~imikolov/rnnlm/) and much more. For English, ready-made embeddings can be downloaded from these web-sites. word2vec really uses skip-gram model (not neural network model). Another fast code for computing word representations is GloVe (http://www-nlp.stanford.edu/projects/glove/). It is an open question whatever deep neural networks are essential for obtaining good embeddings or not.
Depending of your application, you may prefer using different types of word-vectors, so its a good idea to try several popular algorithms and see what works better for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With